Description and usage of ConvertExcelToCSV Processor:

Consumes a Microsoft Excel document and converts each worksheet to csv. Each sheet from the incoming Excel document will generate a new Flowfile that will be output from this processor. Each output Flowfile’s contents will be formatted as a csv file where the each row from the excel sheet is output as a newline in the csv file. This processor is currently only capable of processing .xlsx (XSSF 2007 OOXML file format) Excel documents and not older .xls (HSSF ‘97(-2007) file format) documents. This processor also expects well formatted CSV content and will not escape cell’s containing invalid content such as newlines or additional commas.

Tags:

excel, csv, poi

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Name Default Value Allowable Values Description
Sheets to Extract Comma separated list of Excel document sheet names that should be extracted from the excel document. If this property is left blank then all of the sheets will be extracted from the Excel document. The list of names is case in-sensitive. Any sheets not specified in this value will be ignored.</br> Supports Expression Language: true

Relationships:

Name Description
success Excel data converted to csv
failure Failed to parse the Excel document
original Original Excel document received by this processor

Reads Attributes:

None specified.

Writes Attributes:

Name Description
sheet name The name of the Excel sheet that this particular row of data came from in the Excel document
num rows The number of rows in this Excel Sheet
source filename The name of the Excel document file that this data originated from
convertexceltocsvprocessor.error Error message that was encountered on a per Excel sheet basis. This attribute is only populated if an error was occurred while processing the particular sheet. Having the error present at the sheet level will allow for the end user to better understand what syntax errors in their excel doc on a larger scale caused the error.

How to configure?

Step 1: Drag and drop the ConvertExcelToCSVProcessor processor to the canvas.

Step 2: Double click the processor to configure, the configuration dialog will be opened as follows,

properties

Step 3: Check the usage of each property and update those values.

Properties and usage:

Sheets to Extract: It is used to specify the sheet name of the Excel file.

Number of Rows to Skip: It is used to specify the number of rows to skip when extracting the Excel sheet.

Columns to Skip: It is used to specify the comma-separated list of column numbers to skip when extracting the Excel sheet.

Format Cell Values: If you need to format the cell values of the excel sheet, then choose true otherwise it will be printed as raw values in CSV.

CSV Format: Choose a format from the drop-down options to convert as a CSV file.

properties

Value Separator: It is used to specify the character to separate the values or fields in a CSV file.

Include Header Line: It is used to specify whether to include column names of the Excel sheet in a CSV file or not.

Quote Character: It is used to specify any special character like ”, @,’, etc. that is used to quote the cell values or fields of the CSV file.

properties

Escape Character: The character is used to escape characters that would otherwise have a specific meaning to the CSV Parser. For example, if the value of the cell is 1:2. If it is converted to CSV, it will be converted as 4.3055555555555562E-2. So to escape the conversion, you need to specify escape character like 1:2, and it prints the same value.

Comment Marker: Any line that begins with the specified character will be ignored.

Null String: Replaces the empty cell in CSV with the specified value.

Trim Fields: It is used to specify whether the white space should be removed from the beginning and end of fields or not.

Quote Mode: It is used to specify modes of the quote to the cell values or fields in a CSV file.

properties

Record Separator: It is used to specify the character that is used to split records into new rows or keep the records in a tabbed space.

Include Trailing Delimiter: If true, a trailing delimiter will be added to each CSV Record that is written. If false, the trailing delimiter will be omitted.

State management:

This component does not store state.

Restricted:

This component is not restricted.