Description:
Merges a Group of FlowFiles together based on a user-defined strategy and packages them into a single FlowFile. It is recommended that the Processor be configured with only a single incoming connection, as Group of FlowFiles will not be created from FlowFiles in different connections. This processor updates the mime.type attribute as appropriate.
Tags:
merge, content, correlation, tar, zip, stream, concatenation, archive, flowfile-stream, flowfile-stream-v3
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.
Name | Default Value | Allowable Values | Description |
Merge Strategy | Bin-Packing Algorithm | * Bin-Packing Algorithm </br> * Defragment <img src="MergeContent_images/MergeContent_img1.jpeg" title="Combines fragments that are associated by attributes back into a single cohesive FlowFile. If using this strategy, all FlowFiles must have the attributes <fragment.identifier>, <fragment.count>, and <fragment.index> or alternatively (for backward compatibility purposes) <segment.identifier>, <segment.count>, and <segment.index>. All FlowFiles with the same value for "fragment.identifier" will be grouped together. All FlowFiles in this group must have the same value for the "fragment.count" attribute. All FlowFiles in this group must have a unique value for the "fragment.index" attribute between 0 and the value of the "fragment.count" attribute."/></br> | Specifies the algorithm used to merge content. The 'Defragment' algorithm combines fragments that are associated by attributes back into a single cohesive FlowFile. The 'Bin-Packing Algorithm' generates a FlowFile populated by arbitrarily chosen FlowFiles |
Merge Format | Binary Concatenation | * TAR </br> * ZIP </br> *FlowFile Stream, v3 </br> *FlowFile Stream, v2 </br> * FlowFile Tar, v1 </br> *Binary Concatenation </br> * Avro | Determines the format that will be used to merge the content. |
Attribute Strategy | Keep Only Common Attributes | * Keep Only Common Attributes</br> * Keep All Unique Attributes | Determines which FlowFile attributes should be added to the bundle. If 'Keep All Unique Attributes' is selected, any attribute on any FlowFile that gets bundled will be kept unless its value conflicts with the value from another FlowFile. If 'Keep Only Common Attributes' is selected, only the attributes that exist on all FlowFiles in the bundle, with the same value, will be preserved. |
Correlation Attribute Name |
If specified, like FlowFiles will be binned together, where 'like FlowFiles' means FlowFiles that have the same value for this Attribute. If not specified, FlowFiles are bundled by the order in which they are pulled from the queue. Supports Expression Language: true |
||
Minimum Number of Entries | 1 | The minimum number of files to include in a bundle | |
Maximum Number of Entries | The maximum number of files to include in a bundle. If not specified, there is no maximum. | ||
Minimum Group Size | 0 B | The minimum size of for the bundle | |
Maximum Group Size | The maximum size for the bundle. If not specified, there is no maximum. | ||
Max Bin Age | The maximum age of a Bin that will trigger a Bin to be complete. Expected format is <duration> <time unit> where <duration> is a positive integer and time unit is one of seconds, minutes, hours | ||
Maximum number of Bins | 100 | Specifies the maximum number of bins that can be held in memory at any one time | |
Delimiter Strategy | Filename | * Filename </br> * Text </br> | Determines if Header, Footer, and Demarcator should point to files containing the respective content, or if the values of the properties should be used as the content. |
Header |
Filename specifying the header to use. If not specified, no header is supplied. This property is valid only when using the binary-concatenation merge strategy; otherwise, it is ignored. Supports Expression Language: true |
||
Footer |
Filename specifying the footer to use. If not specified, no footer is supplied. This property is valid only when using the binary-concatenation merge strategy; otherwise, it is ignored. Supports Expression Language: true |
||
Demarcator |
Filename specifying the demarcator to use. If not specified, no demarcator is supplied. This property is valid only when using the binary-concatenation merge strategy; otherwise, it is ignored. Supports Expression Language: true |
||
Compression Level | 1 | * 0 </br> * 1 </br> * 2 </br> * 3 </br> * 4 </br> * 5 </br> * 6 </br> * 7 </br> * 8 </br> * 9 | Specifies the compression level to use when using the Zip Merge Format; if not using the Zip Merge Format, this value is ignored |
Keep Path | false | * true</br> * false | If using the Zip or Tar Merge Format, specifies whether or not the FlowFiles' paths should be included in their entry names; if using other merge strategy, this value is ignored |
Relationships:
Name | Description |
merged | The FlowFile containing the merged content |
original | The FlowFiles that were used to create the bundle |
failure | If the bundle cannot be created, all FlowFiles that would have been used to created the bundle will be transferred to failure |
Reads Attributes:
Name | Description |
fragment.identifier | Applicable only if the <Merge Strategy> property is set to Defragment. All FlowFiles with the same value for this attribute will be bundled together. |
fragment.index | Applicable only if the <Merge Strategy> property is set to Defragment. This attribute indicates the order in which the fragments should be assembled. This attribute must be present on all FlowFiles when using the Defragment Merge Strategy and must be a unique (i.e., unique across all FlowFiles that have the same value for the "fragment.identifier" attribute) integer between 0 and the value of the fragment.count attribute. If two or more FlowFiles have the same value for the "fragment.identifier" attribute and the same value for the "fragment.index" attribute, the behavior of this Processor is undefined. |
fragment.count | Applicable only if the <Merge Strategy> property is set to Defragment. This attribute must be present on all FlowFiles with the same value for the fragment.identifier attribute. All FlowFiles in the same bundle must have the same value for this attribute. The value of this attribute indicates how many FlowFiles should be expected in the given bundle. |
segment.original.filename | Applicable only if the <Merge Strategy> property is set to Defragment. This attribute must be present on all FlowFiles with the same value for the fragment.identifier attribute. All FlowFiles in the same bundle must have the same value for this attribute. The value of this attribute will be used for the filename of the completed merged FlowFile. |
tar.permissions | Applicable only if the <Merge Format> property is set to TAR. The value of this attribute must be 3 characters; each character must be in the range 0 to 7 (inclusive) and indicates the file permissions that should be used for the FlowFile's TAR entry. If this attribute is missing or has an invalid value, the default value of 644 will be used |
Writes Attributes:
Name | Description |
filename | When more than 1 file is merged, the filename comes from the segment.original.filename attribute. If that attribute does not exist in the source FlowFiles, then the filename is set to the number of nanoseconds matching system time. Then a filename extension may be applied:if Merge Format is TAR, then the filename will be appended with .tar, if Merge Format is ZIP, then the filename will be appended with .zip, if Merge Format is FlowFileStream, then the filename will be appended with .pkg |
merge.count | The number of FlowFiles that were merged into this bundle |
merge.bin.age | The age of the bin, in milliseconds, when it was merged and output. Effectively this is the greatest amount of time that any FlowFile in this bundle remained waiting in this processor before it was output |
See Also:
SegmentContent