Description:
Execute provided HiveQL SELECT query against a Hive database connection. Query result will be converted to Avro or CSV format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘selecthiveql.row.count’ indicates how many rows were selected.
Tags:
hive, sql, select, jdbc, query, database
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.
Name |
Default Value |
Allowable Values |
Description |
Hive Database Connection Pooling Service |
Controller Service API: HiveDBCPService Implementation: HiveConnectionPool |
The Hive Controller Service that is used to obtain connection(s) to the Hive database | |
HiveQL Select Query |
HiveQL SELECT query to execute Supports Expression Language: true |
||
Output Format |
Avro |
|
How to represent the records coming from Hive (Avro, CSV, e.g.) |
Relationships:
Name |
Description |
success | Successfully created FlowFile from HiveQL query result set. |
failure | HiveQL query execution failed. Incoming FlowFile will be penalized and routed to this relationship |
Reads Attributes:
None specified.
Writes Attributes:
Name |
Description |
mime.type | Sets the MIME type for the outgoing flowfile to application/avro-binary for Avro or text/csv for CSV. |
filename | Adds .avro or .csv to the filename attribute depending on which output format is selected. |
selecthiveql.row.count | Indicates how many rows were selected/returned by the query. |