Description:

This processor uses Hive Streaming to send flow file data to an Apache Hive table. The incoming flow file is expected to be in Avro format and the table must exist in Hive. Please see the Hive documentation for requirements on the Hive table (format, partitions, etc.). The partition values are extracted from the Avro record based on the names of the partition columns as specified in the processor.

Tags:

hive, streaming, put, database, store

Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the (Expression Language Guide)[https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html].

Name Default Value Allowable Values Description
Hive Metastore URI The URI location for the Hive Metastore. Note that this is not the location of the Hive Server. The default port for the Hive metastore is 9043.
Hive Configuration Resources A file or comma separated list of files which contains the Hive configuration (hive-site.xml, e.g.). Without this, Hadoop will search the classpath for a 'hive-site.xml' file or will revert to a default configuration. Note that to enable authentication with Kerberos e.g., the appropriate properties must be set in the configuration files. Please see the Hive documentation for more details.
Database Name The name of the database in which to put the data.
Table Name The name of the database table in which to put the data.
Partition Columns A comma-delimited list of column names on which the table has been partitioned. The order of values in this list must correspond exactly to the order of partition columns specified during the table creation.
Auto-Create Partitions true *true *false Flag indicating whether partitions should be automatically created
Max Open Connections 8 The maximum number of open connections that can be allocated from this pool at the same time, or negative for no limit.
Heartbeat Interval 60 Indicates that a heartbeat should be sent when the specified number of seconds has elapsed. A value of 0 indicates that no heartbeat should be sent.
Transactions per Batch 100 A hint to Hive Streaming indicating how many transactions the processor task will need. This value must be greater than 1. Supports Expression Language: true
Records per Transaction 10000 Number of records to process before committing the transaction. This value must be greater than 1. Supports Expression Language: true
Rollback On Failure false *true *false Specify how to handle error. By default (false), if an error occurs while processing a FlowFile, the FlowFile will be routed to 'failure' or 'retry' relationship based on error type, and processor can continue with next FlowFile. Instead, you may want to rollback currently processed FlowFiles and stop further processing immediately. In that case, you can do so by enabling this 'Rollback On Failure' property. If enabled, failed FlowFiles will stay in the input relationship without penalizing it and being processed repeatedly until it gets processed successfully or removed by other means. It is important to set adequate 'Yield Duration' to avoid retrying too frequently.NOTE: When an error occurred after a Hive streaming transaction which is derived from the same input FlowFile is already committed, (i.e. a FlowFile contains more records than 'Records per Transaction' and a failure occurred at the 2nd transaction or later) then the succeeded records will be transferred to 'success' relationship while the original input FlowFile stays in incoming queue. Duplicated records can be created for the succeeded ones when the same FlowFile is processed again.
Kerberos Principal Kerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Kerberos Keytab Kerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties

Relationships:

Name Description
retry The incoming FlowFile is routed to this relationship if its records cannot be transmitted to Hive. Note that some records may have been processed successfully, they will be routed (as Avro flow files) to the success relationship. The combination of the retry, success, and failure relationships indicate how many records succeeded and/or failed. This can be used to provide a retry capability since full rollback is not possible.
success A FlowFile containing Avro records routed to this relationship after the record has been successfully transmitted to Hive.
failure A FlowFile containing Avro records routed to this relationship if the record could not be transmitted to Hive.

Reads Attributes:

None specified.

Writes Attributes:

Name Description
hivestreaming.record.count This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the number of records from the incoming flow file written successfully and unsuccessfully, respectively.

State management:

This component does not store state.

Restricted:

This component is not restricted.