Description:

Consumes messages from Apache Kafka built against the Kafka 0.10.x Consumer API. The complementary NiFi processor for sending messages is PublishKafka_0_10.

Tags:

Kafka, Get, Record, CSV, avro, JSON, Ingest, Ingress, Topic, PubSub, Consume, 0.10.x

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Name Default Value Allowable Values Description
Kafka Brokers localhost:9092 A comma-separated list of known Kafka Brokers in the format **host:port**
Topic Name(s) The name of the Kafka Topic(s) to pull from. More than one can be supplied if comma separated.
Supports Expression Language: true
Topic Name Format names *names
*pattern
Specifies whether the Topic(s) provided are a comma separated list of names or a single regular expression
Record Reader Controller Service API:
RecordReaderFactory
Implementations:
JsonPathReader CSVReader ScriptedReader AvroReader GrokReader JsonTreeReader
The Record Reader to use for incoming FlowFiles
Record Writer Controller Service API:
RecordSetWriterFactory
Implementations:
FreeFormTextRecordSetWriter CSVRecordSetWriter JsonRecordSetWriter ScriptedRecordSetWriter AvroRecordSetWriter
The Record Writer to use in order to serialize the data before sending to Kafka
Security Protocol PLAINTEXT *PLAINTEXT
*SSL
*SASL_PLAINTEXT
*SASL_SSL
Protocol used to communicate with brokers. Corresponds to Kafka's 'security.protocol' property.
Kerberos Service Name The Kerberos principal name that Kafka runs as. This can be defined either in Kafka's JAAS config or in Kafka's config. Corresponds to Kafka's 'security.protocol' property.It is ignored unless one of the SASL options of the **Security Protocol** are selected.
Kerberos Principal The Kerberos principal that will be used to connect to brokers. If not set, it is expected to set a JAAS configuration file in the JVM properties defined in the bootstrap.conf file. This principal will be set into 'sasl.jaas.config' Kafka's property.
Kerberos Keytab The Kerberos keytab that will be used to connect to brokers. If not set, it is expected to set a JAAS configuration file in the JVM properties defined in the bootstrap.conf file. This principal will be set into 'sasl.jaas.config' Kafka's property.
SSL Context Service Controller Service API:
SSLContextService
Implementation:
StandardSSLContextService
Specifies the SSL Context Service to use for communicating with Kafka.
Group ID A Group ID is used to identify consumers that are within the same consumer group. Corresponds to Kafka's 'group.id' property.
Offset Reset latest *earliest
*latest
*none
Allows you to manage the condition when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted). Corresponds to Kafka's 'auto.offset.reset' property.
Max Poll Records 10000 Specifies the maximum number of records Kafka should return in a single poll.
Max Uncommitted Time 1 secs Specifies the maximum amount of time allowed to pass before offsets must be committed. This value impacts how often offsets will be committed. Committing offsets less often increases throughput but also increases the window of potential data duplication in the event of a rebalance or JVM restart between commits. This value is also related to maximum poll records and the use of a message demarcator. When using a message demarcator we can have far more uncommitted messages than when we're not as there is much less for us to keep track of in memory.

Dynamic Properties:

Dynamic Properties allow the user to specify both the name and value of a property.

Name Value Description
The name of a Kafka configuration property. The value of a given Kafka configuration property. These properties will be added on the Kafka configuration after loading any provided configuration properties. In the event a dynamic property represents a property that was already set, its value will be ignored and WARN message logged. For the list of available Kafka properties please refer to: http://kafka.apache.org/documentation.html#configuration.

Relationships:

Name Description
success FlowFiles received from Kafka. Depending on demarcation strategy it is a flow file per message or a bundle of messages grouped by topic and partition.
parse.failure If a message from Kafka cannot be parsed using the configured Record Reader, the contents of the message will be routed to this Relationship as its own individual FlowFile.

Reads Attributes:

None specified.

Writes Attributes:

Name Description
record.count The number of records received
mime.type The MIME Type that is provided by the configured Record Writer
kafka.partition The partition of the topic the records are from
kafka.topic The topic records are from

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component does not allow an incoming relationship.

See Also:

ConsumeKafka_0_10, PublishKafka_0_10, PublishKafkaRecord_0_10

Summary:

This Processor polls Apache Kafka for data using KafkaConsumer API available with Kafka 0.10.x. When a message is received from Kafka, the message will be deserialized using the configured Record Reader, and then written to a FlowFile by serializing the message with the configured Record Writer.

Security Configuration:

The Security Protocol property allows the user to specify the protocol for communicating with the Kafka broker. The following sections describe each of the protocols in further detail.

PLAINTEXT

This option provides an unsecured connection to the broker, with no client authentication and no encryption. In order to use this option the broker must be configured with a listener of the form:

PLAINTEXT://host.name:port

SSL

This option provides an encrypted connection to the broker, with optional client authentication. In order to use this option the broker must be configured with a listener of the form:

SSL://host.name:port

In addition, the processor must have an SSL Context Service selected.
If the broker specifies ssl.client.auth=none, or does not specify ssl.client.auth, then the client will not be required to present a certificate. In this case, the SSL Context Service selected may specify only a truststore containing the public key of the certificate authority used to sign the broker’s key.

If the broker specifies ssl.client.auth=required then the client will be required to present a certificate. In this case, the SSL Context Service must also specify a keystore containing a client key, in addition to a truststore as described above.

SASL_PLAINTEXT

This option uses SASL with a PLAINTEXT transport layer to authenticate to the broker. In order to use this option the broker must be configured with a listener of the form:

SASL_PLAINTEXT://host.name:port

In addition, the Kerberos Service Name must be specified in the processor.

SASL_PLAINTEXT - GSSAPI

If the SASL mechanism is GSSAPI, then the client must provide a JAAS configuration to authenticate. The JAAS configuration can be provided by specifying the java.security.auth.login.config system property in NiFi’s bootstrap.conf, such as:

java.arg.16=-Djava.security.auth.login.config=/path/to/kafka_client_jaas.conf

An example of the JAAS config file would be the following:

KafkaClient {
    com.sun.security.auth.module.Krb5LoginModule required
    useKeyTab=true
    storeKey=true
    keyTab="/path/to/nifi.keytab"
    serviceName="kafka"
    principal="nifi@YOURREALM.COM";
};

NOTE

The serviceName in the JAAS file must match the Kerberos Service Name in the processor.
Alternatively, starting with Apache NiFi 1.2.0 which uses the Kafka 0.10.2 client, the JAAS configuration when using GSSAPI can be provided by specifying the Kerberos Principal and Kerberos Keytab directly in the processor properties. This will dynamically create a JAAS configuration like above, and will take precedence over the java.security.auth.login.config system property.

SASL_PLAINTEXT - PLAIN

If the SASL mechanism is PLAIN, then client must provide a JAAS configuration to authenticate, but the JAAS configuration must use Kafka’s PlainLoginModule. An example of the JAAS config file would be the following:

KafkaClient {
  org.apache.kafka.common.security.plain.PlainLoginModule required
  username="nifi"
  password="nifi-password";
};

NOTE: It is not recommended to use a SASL mechanism of PLAIN with SASL_PLAINTEXT, as it would transmit the username and password unencrypted.

NOTE: Using the PlainLoginModule will cause it be registered in the JVM’s static list of Providers, making it visible to components in other NARs that may access the providers. There is currently a known issue where Kafka processors using the PlainLoginModule will cause HDFS processors with Keberos to no longer work.

SASL_SSL

This option uses SASL with an SSL/TLS transport layer to authenticate to the broker. In order to use this option the broker must be configured with a listener of the form:

SASL_SSL://host.name:port

See the SASL_PLAINTEXT section for a description of how to provide the proper JAAS configuration depending on the SASL mechanism (GSSAPI or PLAIN).

See the SSL section for a description of how to configure the SSL Context Service based on the ssl.client.auth property.