Description:
Scans the content of FlowFiles for terms that are found in a user-supplied dictionary. If a term is matched, the UTF-8 encoded version of the term will be added to the FlowFile using the ‘matching.term’ attribute
Tags:
aho-corasick, scan, content, byte sequence, search, find, dictionary
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Name | Default Value | Allowable Values | Description |
Dictionary File | The filename of the terms dictionary | ||
Dictionary Encoding | text |
* text * binary |
Indicates how the dictionary is encoded. If 'text', dictionary terms are new-line delimited and UTF-8 encoded; if 'binary', dictionary terms are denoted by a 4-byte integer indicating the term length followed by the term itself |
Relationships:
Name | Description |
matched | FlowFiles that match at least one term in the dictionary are routed to this relationship |
unmatched | FlowFiles that do not match any term in the dictionary are routed to this relationship |
Reads Attributes:
None specified.
Writes Attributes:
Name | Description |
matching.term | The term that caused the Processor to route the FlowFile to the 'matched' relationship; if FlowFile is routed to the 'unmatched' relationship, this attribute is not added |