Description:
Parses JSON into individual Record objects. The Record that is produced will contain all top-level elements of the corresponding JSON Object. The root JSON element can be either a single element or an array of JSON elements, and each element in that array will be treated as a separate record. If the schema that is configured contains a field that is not present in the JSON, a null value will be used. If the JSON contains a field that is not present in the schema, that field will be skipped. See the Usage of the Controller Service for more information and examples.
Tags:
JSON, tree, record, reader, parser
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language guide.
Name |
Default Value |
Allowable Values |
Description |
Schema Access Strategy |
schema-name |
* Use 'Schema Name' Property * Use 'Schema Text' Property * HWX Schema Reference Attributes * HWX Content-Encoded Schema Reference |
Specifies how to obtain the schema that is to be used for interpreting the data. |
Schema Registry |
Controller Service API: SchemaRegistry Implementations: |
Specifies the Controller Service to use for the Schema Registry | |
Schema Name | ${schema.name} |
Specifies the name of the schema to lookup in the Schema Registry property Supports Expression Language: true |
|
Schema Text | ${avro.schema} |
The text of an Avro-formatted Schema Supports Expression Language: true |
|
Date Format | Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java Simple Date Format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017). | ||
Time Format | Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java Simple Date Format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15). | ||
Timestamp Format | Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java Simple Date Format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15). |
State management:
This component does not store state.
Restricted:
This component is not restricted.
See Also:
JsonPathReader
Summary:
The JsonTreeReader Controller Service reads a JSON Object and creates a Record object for the entire JSON Object tree. The Controller Service must be configured with a Schema that describes the structure of the JSON data. If any field exists in the JSON that is not in the schema, that field will be skipped. If the schema contains a field for which no JSON field exists, a null value will be used in the Record (or the default value defined in the schema, if applicable).
If the root element of the JSON is a JSON Array, each JSON Object within that array will be treated as its own separate Record. If the root element is a JSON Object, the JSON will all be treated as a single Record.
Schemas and Type Coercion
When a record is parsed from incoming data, it is separated into fields. Each of these fields is then looked up against the configured schema (by field name) in order to determine what the type of the data should be. If the field is not present in the schema, that field is omitted from the Record. If the field is found in the schema, the data type of the received data is compared against the data type specified in the schema. If the types match, the value of that field is used as-is. If the schema indicates that the field should be of a different type, then the Controller Service will attempt to coerce the data into the type specified by the schema. If the field cannot be coerced into the specified type, an Exception will be thrown.
The following rules apply when attempting to coerce a field value from one data type to another:
- Any data type can be coerced into a String type.
- Any numeric data type (Byte, Short, Int, Long, Float, Double) can be coerced into any other numeric data type.
- Any numeric value can be coerced into a Date, Time, or Timestamp type, by assuming that the Long value is the number of milliseconds since epoch (Midnight GMT, January 1, 1970).
- A String value can be coerced into a Date, Time, or Timestamp type, if its format matches the configured "Date Format," "Time Format," or "Timestamp Format."
- A String value can be coerced into a numeric value if the value is of the appropriate type. For example, the String value 8 can be coerced into any numeric type. However, the String value 8.2 can be coerced into a Double or Float type but not an Integer.
- A String value of "true" or "false" (regardless of case) can be coerced into a Boolean value.
- A String value that is not empty can be coerced into a Char type. If the String contains more than 1 character, the first character is used and the rest of the characters are ignored.
- Any "date/time" type (Date, Time, Timestamp) can be coerced into any other "date/time" type.
- Any "date/time" type can be coerced into a Long type, representing the number of milliseconds since epoch (Midnight GMT, January 1, 1970).
- Any "date/time" type can be coerced into a String. The format of the String is whatever DateFormat is configured for the corresponding property (Date Format, Time Format, Timestamp Format property). If no value is specified, then the value will be converted into a String representation of the number of milliseconds since epoch (Midnight GMT, January 1, 1970).
If none of the above rules apply when attempting to coerce a value from one data type to another, the coercion will fail and an Exception will be thrown.
Examples
As an example, consider the following JSON is read:
[{
"id": 17,
"name": "John",
"child": {
"id": "1"
},
"dob": "10-29-1982"
"siblings": [
{ "name": "Jeremy", "id": 4 },
{ "name": "Julia", "id": 8}
]
},
{
"id": 98,
"name": "Jane",
"child": {
"id": 2
},
"dob": "08-30-1984"
"gender": "F",
"siblingIds": [],
"siblings": []
}]
Also, consider that the schema that is configured for this JSON is as follows (assuming that the AvroSchemaRegistry Controller Service is chosen to denote the Schema:
{
"namespace": "nifi",
"name": "person",
"type": "record",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "name", "type": "string" },
{ "name": "gender", "type": "string" },
{ "name": "dob", "type": {
"type": "int",
"logicalType": "date"
}},
{ "name": "siblings", "type": {
"type": "array",
"items": {
"type": "record",
"fields": [
{ "name": "name", "type": "string" }
]
}
}}
]
}
Let us also assume that this Controller Service is configured with the “Date Format” property set to “MM-dd-yyyy”, as this matches the date format used for our JSON data. This will result in the JSON creating two separate records, because the root element is a JSON array with two elements.
The first Record will consist of the following values:
Field Name | Field Value | ||||||||
id | 17 | ||||||||
name | John | ||||||||
gender | null | ||||||||
dob | 11-30-1983 | ||||||||
siblings |
array with two elements, each of which is itself a Record:
|
The second Record will consist of the following values:
Field Name | Field Value |
id | 98 |
name | Jane |
gender | F |
dob | 08-30-1984 |
siblingNames | empty array |