Description:

Convert records from one Avro schema to another schema with including support for flattening and simple type conversions.

Tags:

avro, convert, kite

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide

Name Default Value Allowable Values Description
Input Schema Avro Schema of Input Flowfiles
Supports Expression Language: true
Output Schema Avro Schema of Output Flowfiles
Supports Expression Language: true
Locale default Locale to use for scanning data " default" for JVM default

Dynamic Properties:
Dynamic Properties allow the user to specify both the name and value of a property.

Name Value Description
Field name from input schema Field name for output schema Explicit mappings from input schema to output schema, which supports renaming fields and stepping into nested records on the input schema using notation like parent.id

Relationships:

Name Description
success Avro content that converted successfully
failure Avro content that failed to convert

Reads Attributes:

None specified.

Writes Attributes:

None specified.

Summary:

This processor is used to convert data between two Avro formats, such as those coming from the ConvertCSVToAvro orConvertJSONToAvro processors. The input and output content of the flow files should be Avro data files. The processor includes support for the following basic type conversions:

  • Anything to String, using the data’s default String representation

  • String types to numeric types int, long, double, and float

  • Conversion to and from optional Avro types

In addition, fields can be renamed or unpacked from a record type by using the dynamic properties.

Mapping Example:

Throughout this example, we will refer to input data with the following schema:

"type": "record",
"name": "CustomerInput",
"namespace": "org.apache.example",
"fields":[
    {
        "name": "id",
        "type": "string"
    },
    {
        "name": "companyName",
        "type": ["null", "string"],
        "default": null
    },
    {
        "name": "revenue",
        "type": ["null", "string"],
        "default": null</br>
    },
    {
        "name" : "parent",
        "type" : [ "null", {
          "type" : "record",
          "name" : "parent",
          "fields" : [ {
            "name" : "name",
            "type" : ["null", "string"],
            "default" : null
          }, {</br>
            "name" : "id",
            "type" : "string"
          } ]
        } ],
        "default" : null
    }
	]

Where even though the revenue and id fields are mapped as string, they are logically long and double respectively. By default, fields with matching names will be mapped automatically, so the following output schema could be converted without using dynamic properties:

"type": "record",
"name": "SimpleCustomerOutput",
"namespace": "org.apache.example",
"fields": [
    {
        "name": "id",
        "type": "long"
    },
    {
        "name": "companyName",
        "type": ["null", "string"],
        "default": null</br>
    },
    {
        "name": "revenue",
        "type": ["null", "double"],
        "default": null
    }
]

To rename companyName to name and to extract the parent’s id field, both a schema and a dynamic properties must be provided. For example, to convert to the following schema:

"type": "record",
"name": "SimpleCustomerOutput",
"namespace": "org.apache.example",
"fields": [
    {
        "name": "id",
        "type": "long"
    },
    {
        "name": "name",
        "type": ["null", "string"],
        "default": null
    },
    {
        "name": "revenue",
        "type": ["null", "double"],
        "default": null
    },
    {
        "name": "parentId",
        "type": ["null", "long"],
        "default": null
    }
]

The following dynamic properties would be used:

"companyName" -> "name"</br>
"parent.id" -> "parentId"