Description:

Parses JSON records and evaluates user-defined JSON Path’s against each JSON object. The root element may be either a single JSON object or a JSON array. If a JSON array is found, each JSON object within that array is treated as a separate record. User-defined properties define the fields that should be extracted from the JSON in order to form the fields of a Record. Any JSON field that is not extracted via a JSONPath will not be returned in the JSON Records.

Tags:

JSON, jsonpath, record, reader, parser

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.

Name Default Value Allowable Values Description
Schema Access Strategy schema-name
  • Use 'Schema Name' Property
  • Use 'Schema Text' Property
  • HWX Schema Reference Attributes
  • HWX Content-Encoded Schema Reference
  • Use String Fields From Header
Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Registry Controller Service API:
SchemaRegistry </br> Implementations:
AvroSchemaRegistry
HortonworksSchemaRegistry
Specifies the Controller Service to use for the Schema Registry
Schema Name ${schema.name} Specifies the name of the schema to lookup in the Schema Registry property
Supports Expression Language: true
Schema Text ${avro.schema} The text of an Avro-formatted Schema
Supports Expression Language: true
Date Format Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java Simple Date Format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017).
Time Format Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java Simple Date Format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15).
Timestamp Format Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java Simple Date Format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15).

Dynamic Properties:

Dynamic Properties allow the user to specify both the name and value of a property.

Name Value Description
The field name for the record. A JSONPath Expression that will be evaluated against each JSON record. The result of the JSONPath will be the value of the field whose name is the same as the property name. User-defined properties identify how to extract specific fields from a JSON object in order to create a Record

State management:

This component does not store state.

Restricted:

This component is not restricted.

See Also:

JsonTreeReader

Summary:

The JsonPathReader Controller Service, parses FlowFiles that are in the JSON format. User-defined properties specify how to extract all relevant fields from the JSON in order to create a Record. The Controller Service will not be valid unless at least one JSON Path is provided. Unlike the JsonTreeReader Controller Service, this service will return a record that contains only those fields that have been configured via JSON Path.

If the root of the FlowFile’s JSON is a JSON Array, each JSON Object found in that array will be treated as a separate Record, not as a single record made up of an array. If the root of the FlowFile’s JSON is a JSON Object, it will be evaluated as a single Record.

Supplying a JSON Path is accomplished by adding a user-defined property where the name of the property becomes the name of the field in the Record that is returned. The value of the property must be a valid JSON Path expression. This JSON Path will be evaluated against each top-level JSON Object in the FlowFile, and the result will be the value of the field whose name is specified by the property name. If any JSON Path is given but no field is present in the Schema with the proper name, then the field will be skipped.

This Controller Service must be configured with a schema. Each JSON Path that is evaluated and is found in the “root level” of the schema will produce a Field in the Record. I.e., the schema should match the Record that is created by evaluating all of the JSON Paths. It should not match the “incoming JSON” that is read from the FlowFile.

Schemas and Type Coercion

When a record is parsed from incoming data, it is separated into fields. Each of these fields is then looked up against the configured schema (by field name) in order to determine what the type of the data should be. If the field is not present in the schema, that field is omitted from the Record. If the field is found in the schema, the data type of the received data is compared against the data type specified in the schema. If the types match, the value of that field is used as-is. If the schema indicates that the field should be of a different type, then the Controller Service will attempt to coerce the data into the type specified by the schema. If the field cannot be coerced into the specified type, an Exception will be thrown.

The following rules apply when attempting to coerce a field value from one data type to another:

  • Any data type can be coerced into a String type.
  • Any numeric data type (Byte, Short, Int, Long, Float, Double) can be coerced into any other numeric data type.
  • Any numeric value can be coerced into a Date, Time, or Timestamp type, by assuming that the Long value is the number of milliseconds since epoch (Midnight GMT, January 1, 1970).
  • A String value can be coerced into a Date, Time, or Timestamp type, if its format matches the configured "Date Format," "Time Format," or "Timestamp Format."
  • A String value can be coerced into a numeric value if the value is of the appropriate type. For example, the String value 8 can be coerced into any numeric type. However, the String value 8.2 can be coerced into a Double or Float type but not an Integer.
  • A String value of "true" or "false" (regardless of case) can be coerced into a Boolean value.
  • A String value that is not empty can be coerced into a Char type. If the String contains more than 1 character, the first character is used and the rest of the characters are ignored.
  • Any "date/time" type (Date, Time, Timestamp) can be coerced into any other "date/time" type.
  • Any "date/time" type can be coerced into a Long type, representing the number of milliseconds since epoch (Midnight GMT, January 1, 1970).
  • Any "date/time" type can be coerced into a String. The format of the String is whatever DateFormat is configured for the corresponding property (Date Format, Time Format, Timestamp Format property). If no value is specified, then the value will be converted into a String representation of the number of milliseconds since epoch (Midnight GMT, January 1, 1970).

If none of the above rules apply when attempting to coerce a value from one data type to another, the coercion will fail and an Exception will be thrown.

Examples

As an example, consider a FlowFile whose content contains the following JSON:

[{
	"id": 17,
	"name": "John",
	"child": {
		"id": "1"
	},
	"siblingIds": [4, 8],
	"siblings": [
		{ "name": "Jeremy", "id": 4 },
		{ "name": "Julia", "id": 8}
	]
  },
  {
	"id": 98,
	"name": "Jane",
	"child": {
		"id": 2
	},
	"gender": "F",
	"siblingIds": [],
	"siblings": []
}]

And the following schema has been configured:

{
	"namespace": "nifi",
	"name": "person",
	"type": "record",
	"fields": [
		{ "name": "id", "type": "int" },
		{ "name": "name", "type": "string" },
		{ "name": "childId", "type": "long" },
		{ "name": "gender", "type": "string" },
		{ "name": "siblingNames", "type": {
			"type": "array",
			"items": "string"
		}}
	]
}

If we configure this Controller Service with the following user-defined properties:

Property Name Property Value
id $.id
name $.name
childId $.child.id
gender $.gender
siblingNames $.siblings[*].name

In this case, the FlowFile will generate two Records. The first record will consist of the following key/value pairs:

Field Name Field Value
id 17
name John
childId 1
gender null
siblingNames array of two elements: Jeremy and Julia

The second record will consist of the following key/value pairs:

Field Name Field Value
id 98
name Jane
childId 2
gender F
siblingNames empty array