Description and usage of PutSolrRecord processor:

Indexes the Records from a FlowFile into Solr

Tags:

Apache, Solr, Put, Send, Record

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the Expression Language Guide, and whether a property is considered “sensitive”, meaning that its value will be encrypted. Before entering a value in a sensitive property, ensure that the nifi.properties file has an entry for the property nifi.sensitive.props.key.

Name

Default Value

Allowable Values

Description

Solr Type

Standard * Cloud Data Integration Processors SolrCloud
* Standard Data Integration Processors Solr instance
The type of Solr instance, Cloud or Standard.

Solr Location

The Solr url for a Solr Type of Standard (ex: http://localhost:8984/solr/gettingstarted), or the ZooKeeper hosts for a Solr Type of Cloud (ex: localhost:9983).

Supports Expression Language: true (will be evaluated using variable registry only)


Collection The Solr collection name, only used with a Solr Type of Cloud

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Solr Update Path

/update The path in Solr to post the Flowfile Records

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Record Reader

Controller Service API: 


RecordReaderFactory

Implementations: 


AvroReader


SyslogReader


ScriptedReader


XMLReader


GrokReader


Syslog5424Reader


CSVReader


JsonTreeReader


JsonPathReader


Specifies the Controller Service to use for parsing incoming data and determining the data's schema.
Fields To Index Comma-separated list of field names to write

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Commit Within 5000 The number of milliseconds before the given update is committed

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Kerberos Credentials Service

Controller Service API: 


KerberosCredentialsService

Implementation: 

KeytabCredentialsService


Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos
Username The username to use when Solr is configured with basic authentication.

Supports Expression Language: true (will be evaluated using variable registry only)


Password The password to use when Solr is configured with basic authentication.

Sensitive Property: true


Supports Expression Language: true (will be evaluated using variable registry only)


SSL Context Service

Controller Service API: 


SSLContextService

Implementations: 

StandardRestrictedSSLContextService


StandardSSLContextService


The Controller Service to use in order to obtain an SSL Context. This property must be set when communicating with a Solr over https.

Solr Socket Timeout

10 seconds The amount of time to wait for data on a socket connection to Solr. A value of 0 indicates an infinite timeout.

Solr Connection Timeout

10 seconds The amount of time to wait when establishing a connection to Solr. A value of 0 indicates an infinite timeout.

Solr Maximum Connections

10 The maximum number of total connections allowed from the Solr client to Solr.

Solr Maximum Connections Per Host

5 The maximum number of connections allowed from the Solr client to a single Solr host.
ZooKeeper Client Timeout 10 seconds The amount of time to wait for data on a connection to ZooKeeper, only used with a Solr Type of Cloud.
ZooKeeper Connection Timeout 10 seconds The amount of time to wait when establishing a connection to ZooKeeper, only used with a Solr Type of Cloud.
Batch Size 500 The number of solr documents to index per batch

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Dynamic Properties:

Dynamic Properties allow the user to specify both the name and value of a property.

Name

Value

Description

A Solr request parameter name A Solr request parameter value These parameters will be passed to Solr on the request

Supports Expression Language: false


Relationships:

Name

Description

success The original FlowFile
failure FlowFiles that failed for any reason other than Solr being unreachable
connection_failure FlowFiles that failed because Solr is unreachable

Reads Attributes:

None specified.

Writes Attributes:

None specified.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.

Summary

Usage Example

This processor reads the NiFi record and indexes it into Solr as a SolrDocument. Any properties added to this processor by the user are passed to Solr on the update request. It is required that the input record reader should be specified for this processor. Additionally, if only selected fields of a record are to be indexed you can specify the field name as a comma-separated list under the fields property.

Example: To specify specific fields of the record to be indexed:

  • Fields To Index: field1,field2,field3

NOTE: In case of nested the field names should be prefixed with the parent field name.

  • Fields To Index: parentField1,parentField2,</b>parentField3_childField1,parentField3_childField2</b>

In case of nested records, this processor would flatten all the nested records into a single solr document, the field name of the field in a child document would follow the format of {Parent Field Name}_{Child Field Name}.

Example: For a record created from the following json:

{
    "first": "John",
    "last": "R",
    "grade": 8,
    "exams": {
        "subject": "Maths",
        "test" : "term1",
        "marks" : 90
    }
}

The corresponding solr document would be represented as below:

{
    "first": "John",
    "last": "R",
    "grade": 8,
    "exams_subject": "Maths",
    "exams_test" : "term1",
    "exams_marks" : 90
}

Similarly in case of an array of nested records, this processor would flatten all the nested records into a single solr document, the field name of the field in a child document would follow the format of {Parent Field Name}_{Child Field Name} and would be a multivalued field in the solr document. Example: For a record created from the following json:

{
"first": "John",
"last": "R",
"grade": 8,
"exams": [
    {
        "subject": "Maths",
        "test" : "term1",
        "marks" : 90
    },
    {
        "subject": "Physics",
        "test" : "term1",
        "marks" : 95
    }
]
}

The corresponding solr document would be represented as below:

{
    "first": "John",
    "last": "R",
    "grade": 8,
    "exams_subject": ["Maths","Physics"]
    "exams_test" : ["term1","term1"]
    "exams_marks" : [90,95]
}