Description and usage of GetFile:

Creates FlowFiles from files in a directory. NiFi will ignore files it doesn’t have at least read permissions for.

Tags:

local, files, filesystem, ingest, ingress, get, source, input

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.

Name

Default Values

Allowable Values

Description

Input Directory The input directory from which to pull files
Supports Expression Language: true
File Filter [^\.].* Only files whose names match the given regular expression will be picked up
Path Filter When Recurse Subdirectories is true, then only subdirectories whose path matches the given regular expression will be scanned
Batch Size 10 The maximum number of files to pull in each iteration
Keep Source File false
  • true
  • false
If true, the file is not deleted after it has been copied to the Content Repository; this causes the file to be picked up continually and is useful for testing purposes. If not keeping original NiFi will need write permissions on the directory it is pulling from otherwise it will ignore the file.
Recurse Subdirectories true
  • true
  • false
Indicates whether or not to pull files from subdirectories
Polling Interval 0 sec Indicates how long to wait before performing a directory listing
Ignore Hidden Files true
  • true
  • false
Indicates whether or not hidden files should be ignored
Minimum File Age 0 sec The minimum age that a file must be in order to be pulled; any file younger than this amount of time (according to last modification date) will be ignored
Maximum File Age The maximum age that a file must be in order to be pulled; any file older than this amount of time (according to last modification date) will be ignored
Minimum File Size 0 B The minimum size that a file must be in order to be pulled
Maximum File Size The maximum size that a file can be in order to be pulled

Relationships:

Name

Description

success All files are routed to success

Reads Attributes:

None specified.

Writes Attributes:

Name

Description

filename The filename is set to the name of the file on disk
path The path is set to the relative path of the file's directory on disk. For example, if the <Input Directory> property is set to /tmp, files picked up from /tmp will have the path attribute set to ./. If the <Recurse Subdirectories> property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to abc/1/2/3
file.creationTime The date and time that the file was created. May not work on all file systems
file.lastModifiedTime The date and time that the file was last modified. May not work on all file systems
file.lastAccessTime The date and time that the file was last accessed. May not work on all file systems
file.owner The owner of the file. May not work on all file systems
file.group The group owner of the file. May not work on all file systems
file.permissions The read/write/execute permissions of the file. May not work on all file systems
absolute.path The full/absolute path from where a file was picked up. The current 'path' attribute is still populated, but may be a relative path

See Also:

PutFile, FetchFile

How to configure GetFile processor?

Step 1: Drag and drop the GetFile processor to canvas.

Step 2: Double click the processor to configure, the configuration dialog will be opened as follows,

Confuguration dialog

Step 3: Check the usage of each property and update those values.

Properties and usage:

Input Directory: The input directory from which to pull files.

File Filter: Only files whose names match the given regular expression will be picked up.

Keep Source File: Set the “Delete Original” property to “false” to keep the files in the input directory after file transmitting.

Ignore Hidden Files: Indicates whether hidden files should be ignored.

Minimum File Size: The minimum size that a file must be to be pulled.

Maximum File Size: The maximum size that a file can be to be pulled.

Configure GetFile processor File filter with Filename:

We can read the data from mention the file name of the specified input directory in the File Filter as shown in the below screenshot. It will read the data which contain the mentioned filename in all formats.

Confuguration  with Filename

Configure GetFile processor File Filter without filename and extension:

Using expressions we can read the data from without mention the file name and extension of the specified input directory in the File Filter as shown in the below screenshot. It will read all file format files in the input directory.

Confuguration without filename and extension

Configure GetFile processor File Filter as filename and extension:

We can read the data from mention the filename and extension of the specified input directory as shown in the below screenshot. It will read the mentioned file’s data in the input directory.

Confuguration filename and extension

Sample Workflow:

To start with this simple dataflow example will just move Any file placed in an ‘input’ directory to an ‘output’ directory.

This sample workflow uses the GetFile processor to Reads the data from the specified path location, generated as flow file and Writes the data into a specified location implementation in Data Integration platform.

List of processors used in this sample:

Processor

Comments

GetFile Reads content from the given file and generate as flowfile.
PutFile Writes the flow file into a specified location.

Workflow screenshot

Overall workflow

Step 1: Configure GetFile processor

Drag and drop the GetFile processor to the canvas area. Configure Input directory and other required properties in configuration dialog as shown in the following screenshots.

Drop and drag GetFile processor

Configuration of GetFile processor

Step 2: Configure PutFile processor

Drag and drop the PutFile processor to the canvas area and PutFile processor is used Writes the contents of a FlowFile to the local file system and act as downstream connection. Configure required properties in configuration dialog as shown in the following screenshot. Also make connection between GetFile and PutFile with ‘Success’ relationship.

Configuration of put file processor

Step 3: Starting workflow

Once all processors are configured, start the workflow.Now,You can see the dataflow to the Putfile processor.

Data in put file processor

Note: The GetFile and PutFile processors should now be configured so that if you add a file to the ‘input’ directory, it will be moved to the ‘output’ directory.