Description:

Execute provided HiveQL SELECT query against a Hive database connection. Query result will be converted to Avro or CSV format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘selecthiveql.row.count’ indicates how many rows were selected.

Tags:

hive, sql, select, jdbc, query, database

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.

Name

Default Value

Allowable Values

Description

Hive Database Connection Pooling Service

Controller Service API: 


HiveDBCPService

Implementation:


HiveConnectionPool


The Hive Controller Service that is used to obtain connection(s) to the Hive database

HiveQL Select Query

HiveQL SELECT query to execute

Supports Expression Language: true


Output Format

Avro
  • Avro
  • CSV
How to represent the records coming from Hive (Avro, CSV, e.g.)

Relationships:

Name

Description

success Successfully created FlowFile from HiveQL query result set.
failure HiveQL query execution failed. Incoming FlowFile will be penalized and routed to this relationship

Reads Attributes:

None specified.


Writes Attributes:

Name

Description

mime.type Sets the MIME type for the outgoing flowfile to application/avro-binary for Avro or text/csv for CSV.
filename Adds .avro or .csv to the filename attribute depending on which output format is selected.
selecthiveql.row.count Indicates how many rows were selected/returned by the query.