Description:

Evaluates one or more SQL queries against the contents of a FlowFile. The result of the SQL query then becomes the content of the output FlowFile. This can be used, for example, for field-specific filtering, transformation, and row-level filtering. Columns can be renamed, simple calculations and aggregations performed, etc. The Processor is configured with a Record Reader Controller Service and a Record Writer service so as to allow flexibility in incoming and outgoing data formats. The Processor must be configured with at least one user-defined property. The name of the Property is the Relationship to route data to, and the value of the Property is a SQL SELECT statement that is used to specify how input data should be transformed/filtered. The SQL statement must be valid ANSI SQL and is powered by Apache Calcite. If the transformation fails, the original FlowFile is routed to the ‘failure’ relationship. Otherwise, the data selected will be routed to the associated relationship. See the Processor Usage documentation for more information.

Tags:

SQL, query, calcite, route, record, transform, select, update, modify, etl, filter, record, CSV, JSON, logs, text, avro, aggregate

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Name

Default Value

Allowable Values

Description

Record Reader

Controller Service API: 


RecordReaderFactory

Implementations: 

JsonPathReader


CSVReader


ScriptedReader


AvroReader


GrokReader


JsonTreeReader


Specifies the Controller Service to use for parsing incoming data and determining the data's schema

Record Writer

Controller Service API: 


RecordSetWriterFactory

Implementations: 

FreeFormTextRecordSetWriter


CSVRecordSetWriter


JsonRecordSetWriter


ScriptedRecordSetWriter


AvroRecordSetWriter


Specifies the Controller Service to use for writing results to a FlowFile

Include Zero Record FlowFiles

true
  • true
  • false
When running the SQL statement against an incoming FlowFile, if the result has no data, this property specifies whether or not a FlowFile will be sent to the corresponding relationship

Cache Schema

true
  • true
  • false
Parsing the SQL query and deriving the FlowFile's schema is relatively expensive. If this value is set to true, the Processor will cache these values so that the Processor is much more efficient and much faster. However, if this is done, then the schema that is derived for the first FlowFile processed must apply to all FlowFiles. If all FlowFiles will not have the exact same schema, or if the SQL SELECT statement uses the Expression Language, this value should be set to false.

Dynamic Properties:
Dynamic Properties allow the user to specify both the name and value of a property.

Name

Value

Description

The name of the relationship to route data to A SQL SELECT statement that is used to determine what data should be routed to this relationship. Each user-defined property specifies a SQL SELECT statement to run over the data, with the data that is selected being routed to the relationship whose name is the property name

Supports Expression Language: true


Relationships:

Name

Description

failure If a FlowFile fails processing for any reason (for example, the SQL statement contains columns not present in input data), the original FlowFile it will be routed to this relationship
original The original FlowFile is routed to this relationship

Dynamic Relationships:

A Dynamic Relationship may be created based on how the user configures the Processor.

Name

Description

<Property Name> Each user-defined property defines a new Relationship for this Processor.

Reads Attributes:

None specified.

Writes Attributes:

None specified.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

Summary:

QueryRecord provides users a tremendous amount of power by leveraging an extremely well-known syntax (SQL) to route, filter, transform, and query data as it traverses the system. In order to provide the Processor with the maximum amount of flexibility, it is configured with a Controller Service that is responsible for reading and parsing the incoming FlowFiles and a Controller Service that is responsible for writing the results out. By using this paradigm, users are not forced to convert their data from one format to another just to query it, and then transform the data back into the form that they want. Rather, the appropriate Controller Service can easily be configured and put to use for the appropriate data format.

Rather than providing a single “SQL SELECT Statement” type of Property, this Processor makes use of user-defined properties. Each user-defined property that is added to the Processor has a name that becomes a new Relationship for the Processor and a corresponding SQL query that will be evaluated against each FlowFile. This allows multiple SQL queries to be run against each FlowFile.

The SQL syntax that is supported by this Processor is ANSI SQL and is powered by Apache Calcite. Please note that identifiers are quoted using double-quotes, and column names/labels are case-insensitive.