Extract the content metadata from flowfiles containing audio, video, image, and other file types. This processor relies on the Apache Tika project for file format detection and parsing. It extracts a long list of metadata types for media files including audio, video, and print media formats.NOTE: the attribute names and content extracted may vary across upgrades because parsing is performed by the external Tika tools which in turn depend on other projects for metadata extraction. For the more details and the list of supported file types, visit the library’s website at http://tika.apache.org/.
media, file, format, metadata, audio, video, image, document, pdf
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.
|Name||Default Value||Allowable Values||Description|
|Max Number of Attributes||100||Specify the max number of attributes to add to the flowfile. There is no guarantee in what order the tags will be processed. By default it will process all of them.|
|Max Attribute Length.||100||Specifies the maximum length of a single attribute value. When a metadata item has multiple values, they will be merged until this length is reached and then ", ..." will be added as an indicator that additional values where dropped. If a single value is longer than this, it will be truncated and "(truncated)" appended to indicate that truncation occurred.|
|Metadata Key Filter||A regular expression identifying which metadata keys received from the parser should be added to the flowfile attributes. If left blank, all metadata keys parsed will be added to the flowfile attributes.|
|Metadata Key Prefix||Text to be prefixed to metadata keys as the are added to the flowfile attributes. It is recommended to end with with a separator character like '.' or '-', this is not automatically added by the processor. Supports Expression Language: true|
|success||Any FlowFile that successfully has media metadata extracted will be routed to success|
|failure||Any FlowFile that fails to have media metadata extracted will be routed to failure|
|(Metadata Key Prefix)(attribute)||The extracted content metadata will be inserted with the attribute name "(Metadata Key Prefix)(attribute)", or "(attribute)" if "Metadata Key Prefix" is not provided.|
This component does not store state.
This component is not restricted.