Moving data to the cloud is one of the cornerstones of any cloud migration. Apache NiFi is an open source tool that enables you to easily move and process data using a graphical user interface (GUI). In this blog post, we will examine a simple way to move data to the cloud using NiFi complete with practical steps. Calculated Systems offers a cloud-first version of NiFi that you can use to follow along.
Cloud Object Storage
There are many ways to store data on the cloud, but the easiest are the object stores. All three major cloud providers have them:
Amazon – S3 Simple Storage Service
Azure – Blob Storage
Google – GCS Google Cloud Storage
These is an ideal starting point for files as you can typically land the files without too much forethought or capacity planning. Additionally, these object stores are extremely robust, featuring multiple levels of durability and availability.
For the purposes of this tutorial, we will start with the most common object store: Amazon Simple Storage Service (Amazon S3).
Amazon S3 Terminology
Before we get started moving data, let’s establish some basic terminology:
Identity and Access Management (IAM) – Controls for making and controlling who and what can interact with your AWS resources.
Access Keys – These are your access credentials to use AWS. These are not your typical username/password — they are generated using access identity management.
Bucket – A grouping of similar files that must have a unique name. These can be made publicly accessible and are often used to host static objects.
Folder – Much like an operating system folder, these exist within a bucket to enable organization.
Creating an Access Key
For NiFi to have permission to write to S3, we must set it up with an access key pair. There are many ways to do this, but the best practice is to create a new