Description:
Creates Hadoop Sequence Files from incoming flow files
Tags:
Hadoop, sequence file, create, sequencefile
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Name | Default Value | Allowable Values | Description |
Hadoop Configuration Resources | A file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration. | ||
Kerberos Principal | Kerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties | ||
Kerberos Keytab | Kerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties | ||
Kerberos Relogin Period | 4 hours | Period of time which should pass before attempting a Kerberos relogin | |
compression type |
* NONE * DEFAULT * BZIP * GZIP * LZ4 * SNAPPY *AUTOMATIC |
Type of compression to use when creating Sequence File |
Relationships:
Name | Description |
failure | Incoming files that failed to generate a Sequence File are sent to this relationship |
success | Generated Sequence Files are sent to this relationship |
Reads Attributes:
None specified.
Writes Attributes:
None specified.
See Also:
PutHDFS
Summary:
This processor is used to create a Hadoop Sequence File, which essentially is a file of key/value pairs. The key will be a file name and the value will be the fl#ow file content. The processor will take either a merged (a.k.a. packaged) flow file or a singular flow file. Historically, this processor handled the merging by type and size or time prior to creating a SequenceFile output; it no longer does this. If creating a SequenceFile that contains multiple files of the same type is desired, precede this processor with aRouteOnAttribute processor to segregate files of the same type and follow that with a MergeContent processor to bundle up files. If the type of files is not important, just use the MergeContent processor. When using the MergeContent processor, the following Merge Formats are supported by this processor:
-
TAR</br>
-
ZIP</br>
-
FlowFileStream v3</br>
The created SequenceFile is named the same as the incoming FlowFile with the suffix ‘.sf’. For incoming FlowFiles that are bundled, the keys in the SequenceFile are the individual file names, the values are the contents of each file.
NOTE: The value portion of a key/value pair is loaded into memory. While there is a max size limit of 2GB, this could cause memory issues if there are too many concurrent tasks and the flow file sizes are large.