Description and usage of ConvertAvroToParquet processor:

Converts Avro records into Parquet file format. The incoming FlowFile should be a valid avro file. If an incoming FlowFile does not contain any records, an empty parquet file is the output. NOTE: Many Avro data types (collections, primitives, and unions of primitives, e.g.) can be converted to parquet, but unions of collections and other complex data types may not be able to be converted to Parquet.

Tags:

avro, parquet, convert

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.

Name

Default Value

Allowable Values

Description

Compression Type

UNCOMPRESSED * UNCOMPRESSED
* SNAPPY
* GZIP
* LZO
* BROTLI
* LZ4
* ZSTD
The type of compression for the file being written.
Row Group Size The row group size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Page Size The page size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Dictionary Page Size The dictionary page size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Max Padding Size The maximum amount of padding that will be used to align row groups with blocks in the underlying filesystem. If the underlying filesystem is not a block filesystem like HDFS, this has no effect. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Enable Dictionary Encoding * true
* false
Specifies whether dictionary encoding should be enabled for the Parquet writer
Enable Validation * true
* false
Specifies whether validation should be enabled for the Parquet writer
Writer Version * PARQUET_1_0
* PARQUET_2_0
Specifies the version used by Parquet writer

Relationships:

Name

Description

success Parquet file that was converted successfully from Avro
failure Avro content that could not be processed

Reads Attributes:

None specified.

Writes Attributes:

Name

Description

filename Sets the filename to the existing filename with the extension replaced by / added to by .parquet
record.count Sets the number of records in the parquet file.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.