Description:
Fetches files from an FTP Server and creates FlowFiles from them
Tags:
FTP, get, retrieve, files, fetch, remote, ingest, source, input
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the Expression Language Guide, and whether a property is considered “sensitive”, meaning that its value will be encrypted. Before entering a value in a sensitive property, ensure that the nifi.properties file has an entry for the property nifi.sensitive.props.key.
Name | Default Value | Allowable Values | Description |
Hostname |
The fully qualified hostname or IP address of the remote system Supports Expression Language: true |
||
Port | 21 | The port that the remote system is listening on for file transfers | |
Username | Username | ||
Password |
Password for the user account
Sensitive Property: true |
||
Connection Mode | Passive |
* Active * Passive |
The FTP Connection Mode |
Transfer Mode | Binary |
* Binary * ASCII |
The FTP Transfer Mode |
Remote Path |
The path on the remote system from which to pull or push files Supports Expression Language: true |
||
File Filter Regex | Provides a Java Regular Expression for filtering Filenames; if a filter is supplied, only files whose names match that Regular Expression will be fetched | ||
Path Filter Regex | When Search Recursively is true, then only subdirectories whose path matches the given Regular Expression will be scanned | ||
Polling Interval | 60 sec | Determines how long to wait between fetching the listing for new files | |
Search Recursively | false |
* true * false |
If true, will pull files from arbitrarily nested subdirectories; otherwise, will not traverse subdirectories |
Ignore Dotted Files | true |
* true * false |
If true, files whose names begin with a dot (".") will be ignored |
Delete Original | true |
* true * false |
Determines whether or not the file is deleted from the remote system after it has been successfully transferred |
Connection Timeout | 30 sec | Amount of time to wait before timing out while creating a connection | |
Data Timeout | 30 sec | When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems | |
Max Selects | 100 | The maximum number of files to pull in a single connection | |
Remote Poll Batch Size | 5000 | The value specifies how many file paths to find in a given directory on the remote system when doing a file listing. This value in general should not need to be modified but when polling against a remote system with a tremendous number of files this value can be critical. Setting this value too high can result very poor performance and setting it too low can cause the flow to be slower than normal. | |
Use Natural Ordering | false |
* true * false |
If true, will pull files in the order in which they are naturally listed; otherwise, the order in which the files will be pulled is not defined |
Proxy Type | DIRECT |
* DIRECT * HTTP * SOCKS |
Proxy type used for file transfers |
Proxy Host | The fully qualified hostname or IP address of the proxy server | ||
Proxy Port | The port of the proxy server | ||
Http Proxy Username | Http Proxy Username | ||
Http Proxy Password |
Http Proxy Password Sensitive Property: true |
Relationships:
Name | Description |
success | All FlowFiles that are received are routed to success |
Reads Attributes:
None specified.
Writes Attributes:
Name | Description |
filename | The filename is set to the name of the file on the remote server |
path | The path is set to the path of the file's directory on the remote server. For example, if the <Remote Path> property is set to /tmp, files picked up from /tmp will have the path attribute set to /tmp. If the <Search Recursively> property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to /tmp/abc/1/2/3 |
file.lastModifiedTime | The date and time that the source file was last modified |
file.lastAccessTime | The date and time that the file was last accessed. May not work on all file systems |
file.owner | The numeric owner id of the source file |
file.group | The numeric group id of the source file |
file.permissions | The read/write/execute permissions of the source file |
absolute.path | The full/absolute path from where a file was picked up. The current 'path' attribute is still populated, but may be a relative path |
How to configure?
Step 1: Drag and drop the GetFTP processor to canvas.
Step 2: Double click the processor to configure. The configuration dialog will be opened as follows:
Step 3: Check the usage of each property and update those values.
Properties and usage
Hostname: Used to specify the hostname or IP address of FTP (File Transfer Protocol) server.
Port: Used to specify the FTP server port number. By default, it is 21.
Username: Used to specify username.
Password: Used to specify password for the user account.
Connection Mode: Used to specify the connection mode for the FTP server. The following are the types of connection mode:
- Active
- Passive
Transfer Mode: Used to specify the transfer mode for the FTP server and the types are given as follows:
- Binary
- ASCII
Remote Path: Used to specify the path on the remote system from which to pull the files.
File Filter Regex: Used to specify regular expression to filter file names.
Path Filter Regex: Used to filter only sub-directories which matches given regular expression when search recursively is true.
Polling Interval: Used to specify time taken for fetching files.
Search Recursively: Used to specify whether you want to pull files from sub-directories or not.
- True
- False
Ignore Dotted Files: Used to ignore files whose name starts with a dot(.)
Delete Original: Used to delete files from remote system once it is transferred.
Connection Timeout: Used to specify the amount of time to wait while creating a connection.
Data Timeout: Used to specify how long you can wait without any data being transferred between systems when transferring a file between the local and remote system.
Max Selects: Used to specify the maximum number of files to pull in a single connection.
Remote Poll Batch Size: Used to find how many file paths are there in given directory on the remote system when doing a file listing.
Use Natural Ordering: Used to specify whether you want to pull files in the order in which they are naturally listed.
- true
- false
Proxy Type: Used to specify proxy type used for file transfers. The following are the proxy types:
- Direct
- Http
- Socks
Proxy Host: Used to specify the fully qualified hostname or IP address of the proxy server.
Proxy Port: Used to specify the port number of the proxy server.
Http Proxy Username: Used to specify the http proxy username.
Http Proxy Password: Used to specify the http proxy password.
Internal Buffer Size: Used to set the internal buffer size for buffered data streams.
Use UTF-8 Encoding: Used to specify the client to use UTF-8 encoding when processing files and filenames. If set to true, the server must also support UTF-8 encoding.
- true
- false
For example, to retrieve sample.csv file from ftp://172.217.3.14 temp directory of FTP server and store it into SQL table configure the processor as follows:
See Also:
PutFTP