Description and usage of ConvertAvroToParquet processor:
Converts Avro records into Parquet file format. The incoming FlowFile should be a valid avro file. If an incoming FlowFile does not contain any records, an empty parquet file is the output. NOTE: Many Avro data types (collections, primitives, and unions of primitives, e.g.) can be converted to parquet, but unions of collections and other complex data types may not be able to be converted to Parquet.
Tags:
avro, parquet, convert
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.
Name |
Default Value |
Allowable Values |
Description |
Compression Type |
UNCOMPRESSED |
* UNCOMPRESSED * SNAPPY * GZIP * LZO * BROTLI * LZ4 * ZSTD |
The type of compression for the file being written. |
Row Group Size |
The row group size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
||
Page Size |
The page size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
||
Dictionary Page Size |
The dictionary page size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
||
Max Padding Size |
The maximum amount of padding that will be used to align row groups with blocks in the underlying filesystem. If the underlying filesystem is not a block filesystem like HDFS, this has no effect. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
||
Enable Dictionary Encoding |
* true * false |
Specifies whether dictionary encoding should be enabled for the Parquet writer | |
Enable Validation |
* true * false |
Specifies whether validation should be enabled for the Parquet writer | |
Writer Version |
* PARQUET_1_0 * PARQUET_2_0 |
Specifies the version used by Parquet writer |
Relationships:
Name |
Description |
success | Parquet file that was converted successfully from Avro |
failure | Avro content that could not be processed |
Reads Attributes:
None specified.
Writes Attributes:
Name |
Description |
filename | Sets the filename to the existing filename with the extension replaced by / added to by .parquet |
record.count | Sets the number of records in the parquet file. |
State management:
This component does not store state.
Restricted:
This component is not restricted.
Input requirement:
This component requires an incoming relationship.
System Resource Considerations:
None specified.