Description and usage of FetchGCSObject:

Fetches a file from a Google Cloud Bucket. Designed to be used in tandem with ListGCSBucket.

Tags:

google cloud, google, storage, gcs, fetch

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the NiFi Expression Language, and whether a property is considered “sensitive”, meaning that its value will be encrypted. Before entering a value in a sensitive property, ensure that the nifi.properties file has an entry for the property nifi.sensitive.props.key.

Name Default Value Allowable Values Description
GCP Credentials Provider Service Controller Service API:
GCPCredentialsService
Implementations:
GCPCredentialsControllerService
The Controller Service used to obtain Google Cloud Platform credentials.
Project ID Google Cloud Project ID
Number of retries 6 How many retry attempts should be made before routing to the failure relationship.
Bucket ${gcs.bucket} Bucket of the object.</br> Supports Expression Language: true
Key ${filename} Name of the object.</br> Supports Expression Language: true
Object Generation The generation of the Object to download. If null, will download latest generation.</br> Supports Expression Language: true
Server Side Encryption Key An AES256 Key (encoded in base64) which the object has been encrypted in.</br> Sensitive Property: true</br> Supports Expression Language: true

Relationships:

Name Description
success FlowFiles are routed to this relationship after a successful Google Cloud Storage operation.
failure FlowFiles are routed to this relationship if the Google Cloud Storage operation fails.

Reads Attributes:

None specified.

Writes Attributes:

Name Description
filename The name of the file, parsed if possible from the Content-Disposition response header
gcs.bucket Bucket of the object.
gcs.key Name of the object.
gcs.size Size of the object.
gcs.cache.control Data cache control of the object.
gcs.component.count The number of components which make up the object.
gcs.content.disposition The data content disposition of the object.
gcs.content.encoding The content encoding of the object.
gcs.content.language The content language of the object.
mime.type The MIME/Content-Type of the object
gcs.crc32c The CRC32C checksum of object's data, encoded in base64 in big-endian order.
gcs.create.time The creation time of the object (milliseconds)
gcs.update.time The last modification time of the object (milliseconds)
gcs.encryption.algorithm The algorithm used to encrypt the object.
gcs.encryption.sha256 The SHA256 hash of the key used to encrypt the object
gcs.etag The HTTP 1.1 Entity tag for the object.
gcs.generated.id The service-generated for the object
gcs.generation The data generation of the object.
gcs.md5 The MD5 hash of the object's data encoded in base64.
gcs.media.link The media download link to the object.
gcs.metageneration The meta generation of the object.
gcs.owner The owner (uploader) of the object.
gcs.owner.type The ACL entity type of the uploader of the object.
gcs.uri The URI of the object as a string.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component does not allow an incoming relationship.

See Also:

ListGCSBucket, PutGCSObject, DeleteGCSObject

How to configure FetchGCSObject?

Step 1: Drag and drop the FetchGCSObject processor to canvas.

Step 2: Double click the processor to configure, the configuration dialog will be opened as follows,

Confuguration dialog

Step 3: Check the usage of each property and update those values.

Properties and usage:

GCP Credentials Provider Service: The Controller Service used to obtain Google Cloud Platform credentials.

Project ID: ID of the Google Cloud Project.

Number of retries: If there is a failure relationship, how many retries should be made.

Bucket: Bucket of the object.

Key: Name of the object.

Object Generation: The generation of the Object to download.

Configure GCPCredentials Controller Service:

Step 1: To access the Google Cloud Platform information,configure GCPCredentials controller service as follow.

GCP credentials configuration

Step 2: Now, go to your GCPCredentials Controller service and configure the following properties and enable the controller service.

Use Application Default Credentials: Set the default value as ‘False’.

Use Compute Engine Credentials: Set the default value as ‘False’.

Service Account JSON File: Used to specify the file path of the Service account JSON file that contains Private Key.

To generate a private key in JSON format:

  1. Open the list of credentials in the Google Cloud Platform Console.
  2. Click Create credentials.
  3. Select Service account key.
  4. Click the drop-down box below service account, the click New Service account.
  5. Enter a name for the service account in Name.
  6. Use the default Service account ID or generate a different one.
  7. Select the Key type: JSON or P12.
  8. Click create.
  9. A service account created window is displayed and the private key for the key type you selected is downloaded automatically. If you selected a P12 key, the private key’s password is displayed.
  10. Click close.

Step 3: After getting the private key in JSON format store it in your local directory and specify the file path in Service Account JSON File property while configuring the GCPCredentials Controller Service as shown in below screen shot.

Service account file configuration

Step 4: You can also apply private key details as JSON data in Service Account JSON property while configuring the GCPCredentials Controller Service as shown in below screenshot.

Service accout JSON configuration

Step 5: After configured the controller service enable the controller service and start the processor.

Enable the controller service