Description and usage of GetFile:
Creates FlowFiles from files in a directory. NiFi will ignore files it doesn’t have at least read permissions for.
Tags:
local, files, filesystem, ingest, ingress, get, source, input
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.
Name |
Default Values |
Allowable Values |
Description |
Input Directory |
The input directory from which to pull files Supports Expression Language: true |
||
File Filter | [^\.].* | Only files whose names match the given regular expression will be picked up | |
Path Filter | When Recurse Subdirectories is true, then only subdirectories whose path matches the given regular expression will be scanned | ||
Batch Size | 10 | The maximum number of files to pull in each iteration | |
Keep Source File | false |
|
If true, the file is not deleted after it has been copied to the Content Repository; this causes the file to be picked up continually and is useful for testing purposes. If not keeping original NiFi will need write permissions on the directory it is pulling from otherwise it will ignore the file. |
Recurse Subdirectories | true |
|
Indicates whether or not to pull files from subdirectories |
Polling Interval | 0 sec | Indicates how long to wait before performing a directory listing | |
Ignore Hidden Files | true |
|
Indicates whether or not hidden files should be ignored |
Minimum File Age | 0 sec | The minimum age that a file must be in order to be pulled; any file younger than this amount of time (according to last modification date) will be ignored | |
Maximum File Age | The maximum age that a file must be in order to be pulled; any file older than this amount of time (according to last modification date) will be ignored | ||
Minimum File Size | 0 B | The minimum size that a file must be in order to be pulled | |
Maximum File Size | The maximum size that a file can be in order to be pulled |
Relationships:
Name |
Description |
success | All files are routed to success |
Reads Attributes:
None specified.
Writes Attributes:
Name |
Description |
filename | The filename is set to the name of the file on disk |
path | The path is set to the relative path of the file's directory on disk. For example, if the <Input Directory> property is set to /tmp, files picked up from /tmp will have the path attribute set to ./. If the <Recurse Subdirectories> property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to abc/1/2/3 |
file.creationTime | The date and time that the file was created. May not work on all file systems |
file.lastModifiedTime | The date and time that the file was last modified. May not work on all file systems |
file.lastAccessTime | The date and time that the file was last accessed. May not work on all file systems |
file.owner | The owner of the file. May not work on all file systems |
file.group | The group owner of the file. May not work on all file systems |
file.permissions | The read/write/execute permissions of the file. May not work on all file systems |
absolute.path | The full/absolute path from where a file was picked up. The current 'path' attribute is still populated, but may be a relative path |
See Also:
PutFile, FetchFile
How to configure GetFile processor?
Step 1: Drag and drop the GetFile processor to canvas.
Step 2: Double click the processor to configure, the configuration dialog will be opened as follows,
Step 3: Check the usage of each property and update those values.
Properties and usage:
Input Directory: The input directory from which to pull files.
File Filter: Only files whose names match the given regular expression will be picked up.
Keep Source File: Set the “Delete Original” property to “false” to keep the files in the input directory after file transmitting.
Ignore Hidden Files: Indicates whether hidden files should be ignored.
Minimum File Size: The minimum size that a file must be to be pulled.
Maximum File Size: The maximum size that a file can be to be pulled.
Configure GetFile processor File filter with Filename:
We can read the data from mention the file name of the specified input directory in the File Filter as shown in the below screenshot. It will read the data which contain the mentioned filename in all formats.
Configure GetFile processor File Filter without filename and extension:
Using expressions we can read the data from without mention the file name and extension of the specified input directory in the File Filter as shown in the below screenshot. It will read all file format files in the input directory.
Configure GetFile processor File Filter as filename and extension:
We can read the data from mention the filename and extension of the specified input directory as shown in the below screenshot. It will read the mentioned file’s data in the input directory.
Sample Workflow:
To start with this simple dataflow example will just move Any file placed in an ‘input’ directory to an ‘output’ directory.
This sample workflow uses the GetFile processor to Reads the data from the specified path location, generated as flow file and Writes the data into a specified location implementation in Data Integration platform.
List of processors used in this sample:
Processor |
Comments |
GetFile | Reads content from the given file and generate as flowfile. |
PutFile | Writes the flow file into a specified location. |
Workflow screenshot
Step 1: Configure GetFile processor
Drag and drop the GetFile processor to the canvas area. Configure Input directory and other required properties in configuration dialog as shown in the following screenshots.
Step 2: Configure PutFile processor
Drag and drop the PutFile processor to the canvas area and PutFile processor is used Writes the contents of a FlowFile to the local file system and act as downstream connection. Configure required properties in configuration dialog as shown in the following screenshot. Also make connection between GetFile and PutFile with ‘Success’ relationship.
Step 3: Starting workflow
Once all processors are configured, start the workflow.Now,You can see the dataflow to the Putfile processor.
Note: The GetFile and PutFile processors should now be configured so that if you add a file to the ‘input’ directory, it will be moved to the ‘output’ directory.