Description of PutSolrContentStream Processors:
Sends the contents of a FlowFile as a ContentStream to Solr
Tags:
Apache, Solr, Put, Send
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.
Name | Default Value | Allowable Values | Description |
Solr Type | Standard |
* Cloud * Standard |
The type of Solr instance, Cloud or Standard. |
Solr Location | The Solr url for a Solr Type of Standard (ex: http://localhost:8984/solr/gettingstarted), or the ZooKeeper hosts for a Solr Type of Cloud (ex: localhost:9983). | ||
Collection |
The Solr collection name, only used with a Solr Type of Cloud Supports Expression Language: true |
||
Content Stream Path | /update/json/docs |
The path in Solr to post the ContentStream Supports Expression Language: true |
|
Content-Type | application/json |
Content-Type being sent to Solr Supports Expression Language: true |
|
Commit Within |
The number of milliseconds before the given update is committed Supports Expression Language: true |
||
Solr Socket Timeout | 10 seconds | The amount of time to wait for data on a socket connection to Solr. A value of 0 indicates an infinite timeout. | |
Solr Connection Timeout | 10 seconds | The amount of time to wait when establishing a connection to Solr. A value of 0 indicates an infinite timeout. | |
Solr Maximum Connections | 10 | The maximum number of total connections allowed from the Solr client to Solr. | |
Solr Maximum Connections Per Host | 5 | The maximum number of connections allowed from the Solr client to a single Solr host. | |
ZooKeeper Client Timeout | 10 seconds | The amount of time to wait for data on a connection to ZooKeeper, only used with a Solr Type of Cloud. | |
ZooKeeper Connection Timeout | 10 seconds | The amount of time to wait when establishing a connection to ZooKeeper, only used with a Solr Type of Cloud. |
Dynamic Properties:
Dynamic Properties allow the user to specify both the name and value of a property.
Name | Value | Description |
A Solr request parameter name | A Solr request parameter value | These parameters will be passed to Solr on the request |
Relationships:
Name | Description |
failure | FlowFiles that failed for any reason other than Solr being unreachable |
connection_failure | FlowFiles that failed because Solr is unreachable |
success | The original FlowFile |
Reads Attributes:
None specified.
Writes Attributes:
None specified.
Usage Example
This processor streams the contents of a FlowFile to an Apache Solr update handler. Any properties added to this processor by the user are passed to Solr on the update request. If a parameter must be sent multiple times with different values, properties can follow a naming convention: name.number, where name is the parameter name and number is a unique number. Repeating parameters will be sorted by their property name.
Example: To specify multiple ‘f’ parameters for indexing custom JSON, the following properties can be defined:
-
split: /exams
-
f.1: first:/first
-
f.2: last:/last
-
f.3: grade:/grade
This will result in sending the following url to Solr:
split=/exams&f=first:/first&f=last:/last&f=grade:/grade
How to Configure?
Step 1: Drag and drop the PutSolrContentStream processor to the canvas.
Step 2: Double click the processor to configure the processor, the configuration dialog will be opened as follows.
Step 3: Check the usage of each property and update those values.
Properties and Usage
Solr Type: The Solr type is chosen based on whether your solr is running on your localhost (standard) or in cloud mode with an embedded zookeeper.
Solr Location: When the type is selected as standard then paste the URL of the solr instance running on your localhost along with the core name where the content needs to be uploaded.
Example: http://localhost:8983/test21.
When the type is selected as Cloud set the Solr location to the embedded zookeeper (localhost:9983).
Collection: Used to specify the core name in Solr when using the Cloud mode else leave this field blank.
Content Stream Path: The contents of the flow file has been posted to the following path.
Content-Type: The input flow file type can be mentioned on the following property.
Content | Content-Type |
CSV | csv/html |
JSON | Application/json |
Other Files | text/html |
Commit Within: To specify a commit within of “5000” to commit the documents for every 5 seconds (this may not be needed if you set auto-commit settings in your solrconfig.xml).
Solr Socket Timeout: The amount of time to wait for the data on a socket connection to Solr. A value of 0 indicates an infinite timeout.
Solr Connection Timeout: The amount of time to wait when establishing a connection to Solr. A value of 0 indicates an infinite timeout.
Solr Maximum Connections: The maximum number of total connections are allowed from the Solr client to solr.
Solr Maximum Connections per Host: The maximum number of connections are allowed from the Solr client to a single solr host.
Zookeeper Client Timeout: The amount of time to wait for the data on a connection to zookeeper, only used when the solr type is cloud.
ZooKeeper Connection Timeout: The amount of time to wait when establishing a connection to zookeeper, only used when the solr type is cloud.
Post data to Solr using the PutSolrContentStream processor:
Overall Workflow
The following processors have been used in this sample.
Processor | Comments |
---|---|
GenerateFlowFile | Paste the contents of the file that is needed to be updated in the Solr. |
PutSolrContentStream | Configure your Solr to place the contents stored on your flow file. |
Log Attribute | To store the logs on any failure and success. |
Step 1: Drag the GenerateFlowFile processor and paste any content in any one of the formats (i.e. CSV, JSON, XML) to the Custom Text Property.
Step 2: Drag the PutSolrStreamContent processor and configure the processor for Standard mode as follows.
Step 3: In the PutSolrContentStream processor add the properties of your content that you want to upload to Solr by using the Add Property button as follows.
Field Type | Comments |
---|---|
*_i | The dynamic field should be treated as an integer in Solr. |
*_s | The dynamic field should be treated as a string in Solr. |
*_dt | The dynamic field should be treated as Date in Solr. |
*_b | The dynamic field should be treated as boolean in Solr. |
*_txt | The dynamic field should be treated as text in Solr. |
*_f | The dynamic field should be treated as float in Solr. |
*_d | The dynamic field should be treated as double in Solr. |
Step 4: Drag the LogAttribute processor to the canvas and set the Log Level property to error.
Step 5: Run your data flow and open your core to view the data.