Description:

Extract the content metadata from flowfiles containing audio, video, image, and other file types. This processor relies on the Apache Tika project for file format detection and parsing. It extracts a long list of metadata types for media files including audio, video, and print media formats.NOTE: the attribute names and content extracted may vary across upgrades because parsing is performed by the external Tika tools which in turn depend on other projects for metadata extraction. For the more details and the list of supported file types, visit the library’s website at http://tika.apache.org/.

Tags:

media, file, format, metadata, audio, video, image, document, pdf

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.

Name Default Value Allowable Values Description
Max Number of Attributes 100 Specify the max number of attributes to add to the flowfile. There is no guarantee in what order the tags will be processed. By default it will process all of them.
Max Attribute Length. 100 Specifies the maximum length of a single attribute value. When a metadata item has multiple values, they will be merged until this length is reached and then ", ..." will be added as an indicator that additional values where dropped. If a single value is longer than this, it will be truncated and "(truncated)" appended to indicate that truncation occurred.
Metadata Key Filter A regular expression identifying which metadata keys received from the parser should be added to the flowfile attributes. If left blank, all metadata keys parsed will be added to the flowfile attributes.
Metadata Key Prefix Text to be prefixed to metadata keys as the are added to the flowfile attributes. It is recommended to end with with a separator character like '.' or '-', this is not automatically added by the processor. Supports Expression Language: true

Relationships:

Name Description
success Any FlowFile that successfully has media metadata extracted will be routed to success
failure Any FlowFile that fails to have media metadata extracted will be routed to failure

Reads Attributes:

None specified.

Writes Attributes:

Name Description
(Metadata Key Prefix)(attribute) The extracted content metadata will be inserted with the attribute name "(Metadata Key Prefix)(attribute)", or "(attribute)" if "Metadata Key Prefix" is not provided.

State management:

This component does not store state.

Restricted:

This component is not restricted.