Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

S3 (Simple Storage Service) is a highly scalable, object-based cloud storage service used for storing and retrieving data.

There are different way of performing actions on files (e.g. upload, download, read, delete) or bucket specifically (e.g. create bucket) on s3 buckets:

With S3 buckets, accessing data is even easier than before. For those who want to use Python (encouraged), it is easy as pie.

If you're connecting to EUMETSAT buckets, take note of (1).

Make sure you have these ready:

  1. Python 3
  2. Project ID of the bucket you want to mount
  3. Access key and secret access key for the bucket

Take a look at this code segment, which allows you to access the bucket, list its objects and upload/download files from it.

Start by declaring some initial values for boto3 to know where your bucket is located at. Feel free to copy paste this segment and fill in with your own values.

Code Block
languagepy
import os
import io
import boto3


#Initializing some values 
project_id = '123' #Fill this in 
bucketname = 'MyFancyBucket123'  #Fill this in  
access_key = '123asdf'  #Fill this in  
secret_access_key = '123asdf111'  #Fill this in  
endpoint = 'https://my-s3-endpoint.com'  #Fill this in   

(1) If you aren't connecting to EUMETSAT buckets, or a bucket someone shared with you, you can probably skip this part. Now, if the bucket is located at another project (e.g. someone is sharing their bucket with you), you probably need to denote the bucket name with project_id:bucketname. For boto3 to be able to use that kind of naming convention, we need to use a small trick. 

Code Block
languagepy
from botocore.session import Session
from botocore.handlers import validate_bucket_name

#This is a neat trick that allows us to specify our bucket name in terms of ProjectID:bucketname
bucketname = project_id + ':' + bucketname
botocore_session = Session()
botocore_session.unregister('before-parameter-build.s3', validate_bucket_name)
boto3.setup_default_session(botocore_session = botocore_session)

Now lets start by initializing the S3 client with our access keys and endpoint:

Code Block
languagepy
#Initialize the S3 client
s3 = boto3.client('s3', endpoint_url=endpoint,
        aws_access_key_id = access_key,
        aws_secret_access_key = secret_access_key)

As a first step, and to confirm we have successfully connected, lets list the objects inside our bucket (up to a 1.000 objects). 

Code Block
languagepy
#List the objects in our bucket
response = s3.list_objects(Bucket=bucketname)
for item in response['Contents']:
    print(item['Key'])

If you'd want to list more than 1000 objects in a bucket, you can use paginator:

Code Block
languagepy
#List objects with paginator (not constrained to a 1000 objects)
paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket=bucketname)

#Lets store the names of our objects inside a list
objects = []
for page in pages:
    for obj in page['Contents']:
        objects.append(obj["Key"])

print('Number of objects: ', len(objects))

Where an obj looks like this: 

Code Block
languagejava
{'Key': 'MyFile.txt', 'LastModified': datetime.datetime(2021, 11, 11, 0, 39, 23, 320000, tzinfo=tzlocal()), 'ETag': '"2e22f62675cea3445f7e24818a4f6ba0d6-1"', 'Size': 1013, 'StorageClass': 'STANDARD'}

Now lets try to read a file from a bucket into Python's memory, so we can work with it inside Python without ever saving the file to our local computer:

Code Block
languagepy
#Read a file into Python's memory and open it as a string
filename = '/folder1/folder2/myfile.txt'  #Fill this in  
obj = s3.get_object(Bucket=bucketname, Key=filename)
myObject = obj['Body'].read().decode('utf-8') 
print(myObject)

But if you'd want to download the file instead of reading it into memory, here's how you'd do that:

Code Block
languagepy
 #Downloading a file from the bucket
with open('myfile', 'wb') as f:  #Fill this in  
    s3.download_fileobj(bucketname, 'myfile.txt', f) 

And similarly you can upload files to the bucket (given that you have write access to the bucket):

Code Block
languagepy
#Uploading a file to the bucket (make sure you have write access)
response = s3.upload_file('myfile', bucketname, 'myfile')  #Fill this in  

If you're interested in more, I recommend taking a look at this article, which gives you a more detailed view into boto3's functionality (although it does emphasize on Amazon Web Services specifically, you can take a look at the Python code involved):

https://dashbird.io/blog/boto3-aws-python/

Check out a full code example at the official boto3 website: 

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-examples.html

You can also see a differently styled tutorial at:

...



Related articles

Content by Label
showLabelsfalse
max5
spacesEWCLOUDKB
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ("python","s3","kb-how-to-article","how-to","howto") and type = "page" and space = "EWCLOUDKB"
labelspython s3

...