Examines the contents of the incoming FlowFile to infer an Avro schema. The processor will use the Kite SDK to make an attempt to automatically generate an Avro schema from the incoming content. When inferring the schema from JSON data the key names will be used in the resulting Avro schema definition. When inferring from CSV data a “header definition” must be present either as the first line of the incoming data or the “header definition” must be explicitly set in the property “CSV Header Definition”. A “header definition” is simply a single comma separated line defining the names of each column. The “header definition” is required in order to determine the names that should be given to each field in the resulting Avro definition. When inferring data types the higher order data type is always used if there is ambiguity. For example when examining numerical values the type may be set to “long” instead of “integer” since a long can safely hold the value of any “integer”. Only CSV and JSON content is currently supported for automatically inferring an Avro schema. The type of content present in the incoming FlowFile is set by using the property “Input Content Type”. The property can either be explicitly set to CSV, JSON, or “use mime.type value” which will examine the value of the mime.type attribute on the incoming FlowFile to determine the type of content present.


kite, avro, infer, schema, csv, json


In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.

Name Default Value Allowable Values Description
Schema Output Destination flowfile-content * flowfile-attribute
* flowfile-content
Control if Avro schema is written as a new flowfile attribute 'inferred.avro.schema' or written in the flowfile content. Writing to flowfile content will overwrite any existing flowfile content.
Input Content Type use mime.type value * use mime.type value
* json
* csv
Content Type of data present in the incoming FlowFile's content. Only "json" or "csv" are supported. If this value is set to "use mime.type value" the incoming Flowfile's attribute "MIME_TYPE" will be used to determine the Content Type.
CSV Header Definition This property only applies to CSV content type. Comma separated string defining the column names expected in the CSV data. EX: "fname,lname,zip,address". The elements present in this string should be in the same order as the underlying data. Setting this property will cause the value of "Get CSV Header Definition From Data" to be ignored instead using this value.
Supports Expression Language: true
Get CSV Header Definition From Data true * true
* false
This property only applies to CSV content type. If "true" the processor will attempt to read the CSV header definition from the first line of the input data.
CSV Header Line Skip Count 0 This property only applies to CSV content type. Specifies the number of lines that should be skipped when reading the CSV data. Setting this value to 0 is equivalent to saying "the entire contents of the file should be read". If the property "Get CSV Header Definition From Data" is set then the first line of the CSV file will be read in and treated as the CSV header definition. Since this will remove the header line from the data care should be taken to make sure the value of "CSV header Line Skip Count" is set to 0 to ensure no data is skipped.
Supports Expression Language: true
CSV delimiter , Delimiter character for CSV records
CSV Escape String \ This property only applies to CSV content type. String that represents an escape sequence in the CSV FlowFile content data.
Supports Expression Language: true
CSV Quote String ' This property only applies to CSV content type. String that represents a literal quote character in the CSV FlowFile content data.
Supports Expression Language: true
Pretty Avro Output true * true
* false
If true the Avro output will be formatted.
Avro Record Name Value to be placed in the Avro record schema "name" field.
Supports Expression Language: true
Number Of Records To Analyze 10 This property only applies to JSON content type. The number of JSON records that should be examined to determine the Avro schema. The higher the value the better chance kite has of detecting the appropriate type. However the default value of 10 is almost always enough.
Supports Expression Language: true
Charset UTF-8 Character encoding of CSV data.


Name Description
original Original incoming FlowFile data
unsupported content The content found in the flowfile content is not of the required format.
failure Failed to create Avro schema from data.
success Successfully created Avro schema from data.

Reads Attributes:

Name Description
mime.type If configured by property "Input Content Type" will use this value to determine what sort of content should be inferred from the incoming FlowFile content.

Writes Attributes:

Name Description
inferred.avro.schema If configured by "Schema output destination" to write to an attribute this will hold the resulting Avro schema from inferring the incoming FlowFile content.