Description and usage of PutGridFS processor:
Writes a file to a GridFS bucket.
Tags:
mongo, gridfs, put, file, store
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the Expression Language Guide.
Name |
Default Value |
Allowable Values |
Description |
Client Service |
Controller Service API: MongoDBClientService Implementation: MongoDBControllerService |
The MongoDB client service to use for database connections. | |
Mongo Database Name |
The name of the database to use Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
||
Bucket Name |
The GridFS bucket where the files will be stored. If left blank, it will use the default value 'fs' that the MongoDB client driver uses. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
||
File Name |
The name of the file in the bucket that is the target of this processor. GridFS file names do not include path information because GridFS does not sort files into folders within a bucket. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
||
File Properties Prefix |
Attributes that have this prefix will be added to the file stored in GridFS as metadata. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
||
Enforce Uniqueness |
none |
* None * Both * Name * Hash |
When enabled, this option will ensure that uniqueness is enforced on the bucket. It will do so by creating a MongoDB index that matches your selection. It should ideally be configured once when the bucket is created for the first time because it could take a long time to build on an existing bucket wit a lot of data. |
Hash Attribute | hash.value |
If uniqueness enforcement is enabled and the file hash is part of the constraint, this must be set to an attribute that exists on all incoming flowfiles. Supports Expression Language: true (will be evaluated using variable registry only) |
|
Chunk Size |
256 KB |
Controls the maximum size of each chunk of a file uploaded into GridFS. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
Relationships:
Name |
Description |
success | When the operation succeeds, the flowfile is sent to this relationship. |
duplicate | Flowfiles that fail the duplicate check are sent to this relationship. |
failure | When there is a failure processing the flowfile, it goes to this relationship. |
Reads Attributes:
None specified.
Writes Attributes:
None specified.
State management:
This component does not store state.
Restricted:
This component is not restricted.
Input requirement:
This component requires an incoming relationship.
System Resource Considerations:
None specified.
Summary:
This processor puts a file with one or more user-defined metadata values into GridFS in the configured bucket. It allows the user to define how big each file chunk will be during ingestion and provides some ability to intelligently attempt to enforce file uniqueness using filename or hash values instead of just relying on a database index.
GridFS File Attributes
PutGridFS allows for flowfile attributes that start with a configured prefix to be added to the GridFS document. These can be very useful later when working with GridFS for providing metadata about a file.
Chunk Size
GridFS splits up file into chunks within Mongo documents as the file is ingested into the database. The chunk size configuration parameter configures the maximum size of each chunk. This field should be left at its default value unless there is a specific business case to increase or decrease it.
Uniqueness Enforcement
There are four operating modes:
- No enforcement at the application level.
- Enforce by unique file name.
- Enforce by unique hash value.
- Use both hash and file name.
The hash value by default is taken from the attribute hash.value which can be generated by configuring a HashContent processor upstream of PutGridFS. Both this and the name option use a query on the existing data to see if a file matching that criteria exists before attempting to write the flowfile contents.