Description:

Splits a text file into multiple smaller text files on line boundaries, each having up to a configured number of lines.

Tags:

split, text

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Name Default Value Allowable Values Description
Line Split Count The number of lines that will be added to each split file (excluding the header, if the Header Line Count property is greater than 0).
Header Line Count 0 The number of lines that should be considered part of the header; the header lines will be duplicated to all split files
Remove Trailing Newlines true * true
* false
Whether to remove newlines at the end of each split file. This should be false if you intend to merge the split files later. If this is set to 'true' and a FlowFile is generated that contains only 'empty lines' (i.e., consists only of and characters), the FlowFile will not be emitted. Note, however, that if the Header Line Count is greater than 0, the resultant FlowFile will never be empty as it will consist of the header lines, so a FlowFile may be emitted that contians only the header lines.

Relationships:

Name Description
original The original input file will be routed to this destination when it has been successfully split into 1 or more files
failure If a file cannot be split for some reason, the original file will be routed to this destination and nothing will be routed elsewhere
splits The split files will be routed to this destination when an input file is successfully split into 1 or more split files

Reads Attributes:

None specified.

Writes Attributes:

Name Description
text.line.count The number of lines of text from the original FlowFile that were copied to this FlowFile
fragment.identifier All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
fragment.count The number of split FlowFiles generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile

How to configure?

Step 1: Drag and drop the SplitText processor to canvas.

Step 2: Double click the processor to configure, the configuration dialog will be opened as follows,

Step 3: Check the usage of each property and update those values.

Properties and usage

Line Split Count: Enters the number of lines to be added in each split excluding header. If maximum fragment size is available line count will not be considered.

Maximum Fragment Size: Enters the max size of each split file including header.

Header Line Count: Enters the number of the lines that should be considered as header which will be included in each split.

Header Line Marker Characters: User-defined character(s) can be used to identify the header line(s) of the file. The existing property “Header Line Count” must be zero for this new property and behavior should be used.

Remove Trailing Newlines: Specifies whether you want to remove newlines at the end of the split file or not.

For example, if you want to split any file line by line and add the first line as header,configure like below,

Line split count: 1

Header line count: 1

See Also:

MergeContent