Description:

This Processor polls HBase for any records in the specified table. The processor keeps track of the timestamp of the cells that it receives, so that as new records are pushed to HBase, they will automatically be pulled. Each record is output in JSON format, as {“row”: “<row key>”, “cells”: { “<column 1 family>:<column 1 qualifier>”: “<cell 1 value>”, “<column 2 family>:<column 2 qualifier>”: “<cell 2 value>”, … }}. For each record received, a Provenance RECEIVE event is emitted with the format hbase://<table name>/<row key>, where <row key> is the UTF-8 encoded value of the row’s key.

Tags:

hbase, get, ingest

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Name Default Value Allowable Values Description
HBase Client Service Controller Service API:
HBaseClientService
Implementation:
HBase_1_1_2_ClientService
Specifies the Controller Service to use for accessing HBase.
Distributed Cache Service Controller Service API:
DistributedMapCacheClient
Implementation:
DistributedMapCacheClientService
Specifies the Controller Service that should be used to maintain state about what has been pulled from HBase so that if a new node begins pulling data, it won't duplicate all of the work that has been done.
Table Name The name of the HBase Table to put data into
Columns A comma-separated list of "<colFamily>:<colQualifier>" pairs to return when scanning. To return all columns for a given family, leave off the qualifier such as "<colFamily1>,<colFamily2>".
Filter Expression An HBase filter expression that will be applied to the scan. This property can not be used when also using the Columns property.
Initial Time Range None * None</br> * Current Time The time range to use on the first scan of a table. None will pull the entire table on the first scan, Current Time will pull entries from that point forward.
Character Set UTF-8 Specifies which character set is used to encode the data in HBase

Relationships:

Name Description
success All FlowFiles are routed to this relationship

Reads Attributes:
None specified.
Writes Attributes:

Name Description
hbase.table The name of the HBase table that the data was pulled from
mime.type Set to application/json to indicate that output is JSON