Description and usage of PublishKafka_2_0 processor:

Sends the contents of a FlowFile as a message to Apache Kafka using the Kafka 2.0 Producer API.The messages to send may be individual FlowFiles or may be delimited, using a user-specified delimiter, such as a new-line. The complementary NiFi processor for fetching messages is ConsumeKafka_2_0.

Tags:

Apache, Kafka, Put, Send, Message, PubSub, 2.0

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.

Name

Default Value

Allowable Values

Description

Kafka Brokers

localhost:9092 A comma-separated list of known Kafka Brokers in the format <host>:<port>

Supports Expression Language: true (will be evaluated using variable registry only)


Security Protocol

PLAINTEXT * PLAINTEXT 
* SSL 
* SASL_PLAINTEXT 
* SASL_SSL 
Protocol used to communicate with brokers. Corresponds to Kafka's 'security.protocol' property.
Kerberos Service Name The Kerberos principal name that Kafka runs as. This can be defined either in Kafka's JAAS config or in Kafka's config. Corresponds to Kafka's 'security.protocol' property.It is ignored unless one of the SASL options of the <Security Protocol> are selected.

Supports Expression Language: true (will be evaluated using variable registry only)


Kerberos Credentials Service

Controller Service API: 


KerberosCredentialsService

Implementation: 

KeytabCredentialsService


Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos
Kerberos Principal The Kerberos principal that will be used to connect to brokers. If not set, it is expected to set a JAAS configuration file in the JVM properties defined in the bootstrap.conf file. This principal will be set into 'sasl.jaas.config' Kafka's property.

Supports Expression Language: true (will be evaluated using variable registry only)


Kerberos Keytab The Kerberos keytab that will be used to connect to brokers. If not set, it is expected to set a JAAS configuration file in the JVM properties defined in the bootstrap.conf file. This principal will be set into 'sasl.jaas.config' Kafka's property.

Supports Expression Language: true (will be evaluated using variable registry only)


SSL Context Service

Controller Service API: 


SSLContextService

Implementations: 

StandardRestrictedSSLContextService


StandardSSLContextService


Specifies the SSL Context Service to use for communicating with Kafka.

Topic Name

The name of the Kafka Topic to publish to.

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Delivery Guarantee

0 * Best Effort  
* Guarantee Single Node Delivery  
* Guarantee Replicated Delivery  
Specifies the requirement for guaranteeing that a message is sent to Kafka. Corresponds to Kafka's 'acks' property.

Use Transactions

true * true
* false
Specifies whether or not NiFi should provide Transactional guarantees when communicating with Kafka. If there is a problem sending data to Kafka, and this property is set to false, then the messages that have already been sent to Kafka will continue on and be delivered to consumers. If this is set to true, then the Kafka transaction will be rolled back so that those messages are not available to consumers. Setting this to true requires that the <Delivery Guarantee> property be set to "Guarantee Replicated Delivery."
Attributes to Send as Headers (Regex) A Regular Expression that is matched against all FlowFile attribute names. Any attribute whose name matches the regex will be added to the Kafka messages as a Header. If not specified, no FlowFile attributes will be added as headers.
Message Header Encoding UTF-8 For any attribute that is added as a message header, as configured via the <Attributes to Send as Headers> property, this property indicates the Character Encoding to use for serializing the headers.
Kafka Key The Key to use for the Message. If not specified, the flow file attribute 'kafka.key' is used as the message key, if it is present.Beware that setting Kafka key and demarcating at the same time may potentially lead to many Kafka messages with the same key.Normally this is not a problem as Kafka does not enforce or assume message and key uniqueness. Still, setting the demarcator and Kafka key at the same time poses a risk of data loss on Kafka. During a topic compaction on Kafka, messages will be deduplicated based on this key.

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Key Attribute Encoding

utf-8 * UTF-8 Encoded 
* Hex Encoded 
FlowFiles that are emitted have an attribute named 'kafka.key'. This property dictates how the value of the attribute should be encoded.
Message Demarcator Specifies the string (interpreted as UTF-8) to use for demarcating multiple messages within a single FlowFile. If not specified, the entire content of the FlowFile will be used as a single message. If specified, the contents of the FlowFile will be split on this delimiter and each section sent as a separate Kafka message. To enter special character such as 'new line' use CTRL+Enter or Shift+Enter, depending on your OS.

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Max Request Size

1 MB The maximum size of a request in bytes. Corresponds to Kafka's 'max.request.size' property and defaults to 1 MB (1048576).

Acknowledgment Wait Time

5 secs After sending a message to Kafka, this indicates the amount of time that we are willing to wait for a response from Kafka. If Kafka does not acknowledge the message within this time period, the FlowFile will be routed to 'failure'.

Max Metadata Wait Time

5 sec The amount of time publisher will wait to obtain metadata or wait for the buffer to flush during the 'send' call before failing the entire 'send' call. Corresponds to Kafka's 'max.block.ms' property

Supports Expression Language: true (will be evaluated using variable registry only)


Partitioner class org.apache.kafka.clients.producer.internals.DefaultPartitioner * RoundRobinPartitioner 
* DefaultPartitioner 
Specifies which class to use to compute a partition id for a message. Corresponds to Kafka's 'partitioner.class' property.

Compression Type

none * none
* gzip
* snappy
* lz4
This parameter allows you to specify the compression codec for all data generated by this producer.

Dynamic Properties:

Dynamic Properties allow the user to specify both the name and value of a property.

Name

Value

Description

The name of a Kafka configuration property. The value of a given Kafka configuration property. These properties will be added on the Kafka configuration after loading any provided configuration properties. In the event a dynamic property represents a property that was already set, its value will be ignored and WARN message logged. For the list of available Kafka properties please refer to: http://kafka.apache.org/documentation.html#configuration. 

Supports Expression Language: true (will be evaluated using variable registry only)


Relationships:

Name

Description

success FlowFiles for which all content was sent to Kafka.
failure Any FlowFile that cannot be sent to Kafka will be routed to this Relationship

Reads Attributes:

None specified.

Writes Attributes:

Name

Description

msg.count The number of messages that were sent to Kafka for this FlowFile. This attribute is added only to FlowFiles that are routed to success. If the <Message Demarcator> Property is not set, this will always be 1, but if the Property is set, it may be greater than 1.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.

Summary:

This Processor puts the contents of a FlowFile to a Topic in Apache Kafka using KafkaProducer API available with Kafka 2.0 API. The content of a FlowFile becomes the contents of a Kafka message. This message is optionally assigned a key by using the <Kafka Key> Property.

The Processor allows the user to configure an optional Message Demarcator that can be used to send many messages per FlowFile. For example, a |ncould be used to indicate that the contents of the FlowFile should be used to send one message per line of text. It also supports multi-char demarcators (e.g., ‘my custom demarcator’). If the property is not set, the entire contents of the FlowFile will be sent as a single message. When using the demarcator, if some messages are successfully sent but other messages fail to send, the resulting FlowFile will be considered a failed FlowFile and will have additional attributes to that effect. One of such attributes is ‘failed.last.idx’ which indicates the index of the last message that was successfully ACKed by Kafka. (if no demarcator is used the value of this index will be -1). This will allow PublishKafka to only re-send un-ACKed messages on the next re-try.

Security Configuration

The Security Protocol property allows the user to specify the protocol for communicating with the Kafka broker. The following sections describe each of the protocols in further detail.

PLAINTEXT

This option provides an unsecured connection to the broker, with no client authentication and no encryption. In order to use this option the broker must be configured with a listener of the form:

PLAINTEXT://host.name:port

SSL

This option provides an encrypted connection to the broker, with optional client authentication. In order to use this option the broker must be configured with a listener of the form:

SSL://host.name:port

In addition, the processor must have an SSL Context Service selected.

If the broker specifies ssl.client.auth=none, or does not specify ssl.client.auth, then the client will not be required to present a certificate. In this case, the SSL Context Service selected may specify only a truststore containing the public key of the certificate authority used to sign the broker’s key.

If the broker specifies ssl.client.auth=required then the client will be required to present a certificate. In this case, the SSL Context Service must also specify a keystore containing a client key, in addition to a truststore as described above.

SASL_PLAINTEXT

This option uses SASL with a PLAINTEXT transport layer to authenticate to the broker. In order to use this option the broker must be configured with a listener of the form:

SASL_PLAINTEXT://host.name:port

In addition, the Kerberos Service Name must be specified in the processor.

SASL_PLAINTEXT - GSSAPI

If the SASL mechanism is GSSAPI, then the client must provide a JAAS configuration to authenticate. The JAAS configuration can be provided by specifying the java.security.auth.login.config system property in NiFi’s bootstrap.conf, such as:

java.arg.16=-Djava.security.auth.login.config=/path/to/kafka_client_jaas.conf

An example of the JAAS config file would be the following:

KafkaClient {
    com.sun.security.auth.module.Krb5LoginModule required
    useKeyTab=true
    storeKey=true
    keyTab="/path/to/nifi.keytab"
    serviceName="kafka"
    principal="nifi@YOURREALM.COM";
};   

NOTE: The serviceName in the JAAS file must match the Kerberos Service Name in the processor.

Alternatively, the JAAS configuration when using GSSAPI can be provided by specifying the Kerberos Principal and Kerberos Keytab directly in the processor properties. This will dynamically create a JAAS configuration like above, and will take precedence over the java.security.auth.login.config system property.

SASL_PLAINTEXT - PLAIN

If the SASL mechanism is PLAIN, then client must provide a JAAS configuration to authenticate, but the JAAS configuration must use Kafka’s PlainLoginModule. An example of the JAAS config file would be the following:

KafkaClient {
  org.apache.kafka.common.security.plain.PlainLoginModule required
  username="nifi"
  password="nifi-password";
};

NOTE: It is not recommended to use a SASL mechanism of PLAIN with SASL_PLAINTEXT, as it would transmit the username and password unencrypted.

NOTE: Using the PlainLoginModule will cause it be registered in the JVM’s static list of Providers, making it visible to components in other NARs that may access the providers. There is currently a known issue where Kafka processors using the PlainLoginModule will cause HDFS processors with Keberos to no longer work.

SASL_SSL

This option uses SASL with an SSL/TLS transport layer to authenticate to the broker. In order to use this option the broker must be configured with a listener of the form:

SASL_SSL://host.name:port

See the SASL_PLAINTEXT section for a description of how to provide the proper JAAS configuration depending on the SASL mechanism (GSSAPI or PLAIN).

See the SSL section for a description of how to configure the SSL Context Service based on the ssl.client.auth property.