Data Engineering Over Simplified Part 2(S3)

Create an S3 Bucket using AWS Web Console

  • Go to AWS Web Console and go to s3. Create a new bucket by name ittv-github. We can also create folders for landing and raw zones for our data. The landing zone will be used to ingest data from external sources. We will store data in the landing zone using JSON. Typically, data in the landing zone will be deleted. It will act as a scratch pad, and we can delete data that is older than 30 days or as per the SLAs. The raw zone will be used to store data from sources following our data lake standards. In our case we will use parquet as target file format and partition all the data on a daily basis. We will have the data in the raw zone up to 7 to 10 years in most of the cases as part of data lakes.

Overview of Roles

[Instructions and Code] Create and Attach Custom Policy

  • Go to the ITVGitHubGroup and attach the policy ITVGitHubS3FullPolicy.

S3 Bucket- Points to Note

Versioning Control

AWS S3 Cross-Region Replication for fault tolerance

  • In some extreme cases, our S3 might not be accessible within a specific region due to unforeseen circumstances which might impact data centers in AWS Region or AZ within AWS Region.
  • By enabling Cross-Region Replication we can have a copy of the s3 bucket or objects within the bucket in some other Region.

Storage Class (Change if needed)

  • The glacier is a low-cost tier within s3.
  • We can use Glacier either to manage older versions or replicas for backup.
  • Here are the most common ways in which we can set storage class as Glacier.
  • Edit the object or folder to use Glacier.
  • Configure Glacier as part of lifecycle management to move older versions to Glacier.
  • Configure Glacier as part of defining the Cross-Region Replication rule.

Using AWS CLI for S3

Conclusion: Performing the below tasks to gain comfort using CLI to manage objects in S3.

  • List the folders in the rettail bucket created earlier. It is recommended to list the objects recursively to review all the objects. Answer: “aws s3 ls s3://rettail — recursive”
  • Delete the folders in the retaildb main folder from the bucket created earlier. Answer: aws s3 rm s3://rettail/dgretail/
  • Go to AWS Web Console and confirm that folders and objects in the retaildb folder within the rettail bucket are deleted.
  • To Copy a file from your local computer and insert it into S3, use the below command

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How I landed my first Javascript Development Job and got 6 figures for it by Aziz Ali at…

Magento 2 Product Catalog SEO

It’s the 5th Birthday of Feren OS! (and Feren OS July 2020 Snapshot is now available!)

How To Create a Swap Space on Ubuntu

Konomi Monthly Updates — June

Evangelising Agility at Yuletide

How to Host FREE Wordpress Site on OCI

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
abhinaya rajaram

abhinaya rajaram

Data in Law

More from Medium

Machine learning basics (part 11): Nearest neighbor searching

Data Extraction in Tableau

PCA → PRINCIPAL COMPONENT ANALYSIS

Python Data Structure