Description of PutSolrContentStream Processors:

Sends the contents of a FlowFile as a ContentStream to Solr

Tags:

Apache, Solr, Put, Send

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.

Name Default Value Allowable Values Description
Solr Type Standard * Cloud
* Standard
The type of Solr instance, Cloud or Standard.
Solr Location The Solr url for a Solr Type of Standard (ex: http://localhost:8984/solr/gettingstarted), or the ZooKeeper hosts for a Solr Type of Cloud (ex: localhost:9983).
Collection The Solr collection name, only used with a Solr Type of Cloud
Supports Expression Language: true
Content Stream Path /update/json/docs The path in Solr to post the ContentStream
Supports Expression Language: true
Content-Type application/json Content-Type being sent to Solr
Supports Expression Language: true
Commit Within The number of milliseconds before the given update is committed
Supports Expression Language: true
Solr Socket Timeout 10 seconds The amount of time to wait for data on a socket connection to Solr. A value of 0 indicates an infinite timeout.
Solr Connection Timeout 10 seconds The amount of time to wait when establishing a connection to Solr. A value of 0 indicates an infinite timeout.
Solr Maximum Connections 10 The maximum number of total connections allowed from the Solr client to Solr.
Solr Maximum Connections Per Host 5 The maximum number of connections allowed from the Solr client to a single Solr host.
ZooKeeper Client Timeout 10 seconds The amount of time to wait for data on a connection to ZooKeeper, only used with a Solr Type of Cloud.
ZooKeeper Connection Timeout 10 seconds The amount of time to wait when establishing a connection to ZooKeeper, only used with a Solr Type of Cloud.

Dynamic Properties:

Dynamic Properties allow the user to specify both the name and value of a property.

Name Value Description
A Solr request parameter name A Solr request parameter value These parameters will be passed to Solr on the request

Relationships:

Name Description
failure FlowFiles that failed for any reason other than Solr being unreachable
connection_failure FlowFiles that failed because Solr is unreachable
success The original FlowFile

Reads Attributes:

None specified.

Writes Attributes:

None specified.

Usage Example

This processor streams the contents of a FlowFile to an Apache Solr update handler. Any properties added to this processor by the user are passed to Solr on the update request. If a parameter must be sent multiple times with different values, properties can follow a naming convention: name.number, where name is the parameter name and number is a unique number. Repeating parameters will be sorted by their property name.

Example: To specify multiple ‘f’ parameters for indexing custom JSON, the following properties can be defined:

  • split: /exams

  • f.1: first:/first

  • f.2: last:/last

  • f.3: grade:/grade

This will result in sending the following url to Solr: 

split=/exams&f=first:/first&f=last:/last&f=grade:/grade

How to Configure?

Step 1: Drag and drop the PutSolrContentStream processor to the canvas.

Step 2: Double click the processor to configure the processor, the configuration dialog will be opened as follows.

PutSolrContentStreamProcessorConfiguration

Step 3: Check the usage of each property and update those values.

Properties and Usage

Solr Type: The Solr type is chosen based on whether your solr is running on your localhost (standard) or in cloud mode with an embedded zookeeper.

Solr Location: When the type is selected as standard then paste the URL of the solr instance running on your localhost along with the core name where the content needs to be uploaded.

Example: http://localhost:8983/test21.

When the type is selected as Cloud set the Solr location to the embedded zookeeper (localhost:9983).

Collection: Used to specify the core name in Solr when using the Cloud mode else leave this field blank.

Content Stream Path: The contents of the flow file has been posted to the following path.

Content-Type: The input flow file type can be mentioned on the following property.

Content Content-Type
CSV csv/html
JSON Application/json
Other Files text/html

Commit Within: To specify a commit within of “5000” to commit the documents for every 5 seconds (this may not be needed if you set auto-commit settings in your solrconfig.xml).

Solr Socket Timeout: The amount of time to wait for the data on a socket connection to Solr. A value of 0 indicates an infinite timeout.

Solr Connection Timeout: The amount of time to wait when establishing a connection to Solr. A value of 0 indicates an infinite timeout.

Solr Maximum Connections: The maximum number of total connections are allowed from the Solr client to solr.

Solr Maximum Connections per Host: The maximum number of connections are allowed from the Solr client to a single solr host.

Zookeeper Client Timeout: The amount of time to wait for the data on a connection to zookeeper, only used when the solr type is cloud.

ZooKeeper Connection Timeout: The amount of time to wait when establishing a connection to zookeeper, only used when the solr type is cloud.

Post data to Solr using the PutSolrContentStream processor:

Overall Workflow

PutSolrContentStreamProcessorConfiguration

The following processors have been used in this sample.

Processor Comments
GenerateFlowFile Paste the contents of the file that is needed to be updated in the Solr.
PutSolrContentStream Configure your Solr to place the contents stored on your flow file.
Log Attribute To store the logs on any failure and success.

Step 1: Drag the GenerateFlowFile processor and paste any content in any one of the formats (i.e. CSV, JSON, XML) to the Custom Text Property.

FlowFile Processor

Step 2: Drag the PutSolrStreamContent processor and configure the processor for Standard mode as follows.

PutSolr Processor

Step 3: In the PutSolrContentStream processor add the properties of your content that you want to upload to Solr by using the Add Property button as follows.

Add Property

Field Type Comments
*_i The dynamic field should be treated as an integer in Solr.
*_s The dynamic field should be treated as a string in Solr.
*_dt The dynamic field should be treated as Date in Solr.
*_b The dynamic field should be treated as boolean in Solr.
*_txt The dynamic field should be treated as text in Solr.
*_f The dynamic field should be treated as float in Solr.
*_d The dynamic field should be treated as double in Solr.

Step 4: Drag the LogAttribute processor to the canvas and set the Log Level property to error.

Log Attribute

Step 5: Run your data flow and open your core to view the data.

Output