Description and usage of CalculateRecordStats processor:

A processor that can count the number of items in a record set, as well as provide counts based on user-defined criteria on subsets of the record set.

Tags:

record, stats, metrics

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.

Name

Default Value

Allowable Values

Description

Record Reader

Controller Service API: 


RecordReaderFactory

Implementations: 

AvroReader


SyslogReader


ScriptedReader


XMLReader


GrokReader


Syslog5424Reader


CSVReader


JsonTreeReader


JsonPathReader


A record reader to use for reading the records.

record-stats-limit

10 Limit the number of individual stats that are returned for each record path to the top N results.

Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)


Relationships:

Name

Description

success If a flowfile is successfully processed, it goes here.
failure If a flowfile fails to be processed, it goes here.

Reads Attributes:

None specified.

Writes Attributes:

Name

Description

record.count A count of the records in the record set in the flowfile.
recordStats.<User Defined Property Name>.count A count of the records that contain a value for the user defined property.
recordStats.<User Defined Property Name>.<value>.count Each value discovered for the user defined property will have its own count attribute. Total number of top N value counts to be added is defined by the limit configuration.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.

Summary:

This processor takes in a record set and counts both the overall count and counts that are defined as dynamic properties that map a property name to a record path. Record path counts are provided at two levels:

  • The overall count of all records that successfully evaluated a record path.
  • A breakdown of counts of unique values that matched the record path operation.
    Consider the following record structure:

    {
    “sport”: “Soccer”,
    “name”: “John Smith”
    }

A valid mapping here would be sport => /sport.
For a record set with JSON like that, five entries and 3 instances of soccer and two instances of football, it would set the following attributes:

  • record_count: 5
  • sport: 5
  • sport.Soccer: 3
  • sport.Football: 2