Splits a text file into multiple smaller text files on line boundaries, each having up to a configured number of lines.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
|Name||Default Value||Allowable Values||Description|
|Line Split Count||The number of lines that will be added to each split file (excluding the header, if the Header Line Count property is greater than 0).|
|Header Line Count||0||The number of lines that should be considered part of the header; the header lines will be duplicated to all split files|
|Remove Trailing Newlines||true||
|Whether to remove newlines at the end of each split file. This should be false if you intend to merge the split files later. If this is set to 'true' and a FlowFile is generated that contains only 'empty lines' (i.e., consists only of and characters), the FlowFile will not be emitted. Note, however, that if the Header Line Count is greater than 0, the resultant FlowFile will never be empty as it will consist of the header lines, so a FlowFile may be emitted that contians only the header lines.|
|original||The original input file will be routed to this destination when it has been successfully split into 1 or more files|
|failure||If a file cannot be split for some reason, the original file will be routed to this destination and nothing will be routed elsewhere|
|splits||The split files will be routed to this destination when an input file is successfully split into 1 or more split files|
|text.line.count||The number of lines of text from the original FlowFile that were copied to this FlowFile|
|fragment.identifier||All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute|
|fragment.index||A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile|
|fragment.count||The number of split FlowFiles generated from the parent FlowFile|
|segment.original.filename||The filename of the parent FlowFile|
How to configure?
Step 1: Drag and drop the SplitText processor to canvas.
Step 2: Double click the processor to configure, the configuration dialog will be opened as follows,
Step 3: Check the usage of each property and update those values.
Properties and usage
Line Split Count: Enters the number of lines to be added in each split excluding header. If maximum fragment size is available line count will not be considered.
Maximum Fragment Size: Enters the max size of each split file including header.
Header Line Count: Enters the number of the lines that should be considered as header which will be included in each split.
Header Line Marker Characters: User-defined character(s) can be used to identify the header line(s) of the file. The existing property “Header Line Count” must be zero for this new property and behavior should be used.
Remove Trailing Newlines: Specifies whether you want to remove newlines at the end of the split file or not.
For example, if you want to split any file line by line and add the first line as header,configure like below,
Line split count: 1
Header line count: 1