Description:

Scrolls through an Elasticsearch query using the specified connection properties. This processor is intended to be run on the primary node, and is designed for scrolling through huge result sets, as in the case of a reindex. The state must be cleared before another query can be run. Each page of results is returned, wrapped in a JSON object like so: { “hits” : [ <doc1>, <doc2>, <docn> ] }. Note that the full body of each page of documents will be read into memory before being written to a Flow File for transfer.

Tags:

elasticsearch, query, scroll, read, get, http

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the Expression Language Guide, and whether a property is considered “sensitive”, meaning that its value will be encrypted. Before entering a value in a sensitive property, ensure that the nifi.properties file has an entry for the property nifi.sensitive.props.key.

Name	Default Value	Allowable Values	Description
Elasticsearch URL			Elasticsearch URL which will be connected to, including scheme (http, e.g.), host, and port. The default port for the REST API is 9200.
SSL Context Service		Controller Service API: SSLContextService Implementation: StandardSSLContextService	The SSL Context Service used to provide client certificate information for TLS/SSL connections. This service only applies if the Elasticsearch endpoint(s) have been secured with TLS/SSL.
Username			Username to access the Elasticsearch cluster
Password			Password to access the Elasticsearch cluster Sensitive Property: true
Connection Timeout	5 secs		Max wait time for the connection to the Elasticsearch REST API.
Response Timeout	15 secs		Max wait time for a response from the Elasticsearch REST API.
Query			The Lucene-style query to run against ElasticSearch (e.g., genre:blues AND -artist:muddy) Supports Expression Language: true
Scroll Duration	1m		The scroll duration is how long each search context is kept in memory. Supports Expression Language: true
Page Size	20		Determines how many documents to return per page during scrolling. Supports Expression Language: true
Index			The name of the index to read from. If the property is set to _all, the query will match across all indexes. Supports Expression Language: true
Type			The (optional) type of this query, used by Elasticsearch for indexing and searching. If the property is empty, the the query will match across all types. Supports Expression Language: true
Fields			A comma-separated list of fields to retrieve from the document. If the Fields property is left blank, then the entire document's source will be retrieved. Supports Expression Language: true
Sort			A sort parameter (e.g., timestamp:asc). If the Sort property is left blank, then the results will be retrieved in document order. Supports Expression Language: true

Relationships:

Name	Description
success	All FlowFiles that are read from Elasticsearch are routed to this relationship.
failure	All FlowFiles that cannot be read from Elasticsearch are routed to this relationship. Note that only incoming flow files will be routed to failure.

Reads Attributes:

None specified.

Writes Attributes:

Name	Description
es.index	The Elasticsearch index containing the document
es.type	The Elasticsearch document type

State management:

Scope	Description
LOCAL	After each successful scroll page, the latest scroll_id is persisted in scrollId as input for the next scroll call. Once the entire query is complete, finishedQuery state will be set to true, and the processor will not execute unless this is