You may want to make use of Object Storage in your infrastructure. An S3-compatible service can be enabled for your tenant so you can store or retrieve data from buckets stored in the cloud, offered by both ECMWF and EUMETSAT.

At the moment the access to this service is not activated by default for every tenant. If you wish to use it, please raise an issue through the Support Portal requesting access to this service.

You may also use this guide to use any other S3 storage service such as AWS from your instances at the European Weather Cloud. Just adapt the host and credential information accordingly.

Managing your Object Storage with S3cmd

S3cmd is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol. 

Many other advanced tools (e.g. https://rclone.org/) exist, as do APIs for many languages, but this article aims only to demonstrate the basics.

Install the tool

The easiest is to install it through the system package manager

sudo yum install s3cmd

Or for Ubuntu:

sudo apt install s3cmd

Alternatively, you may get the latest version from PyPi

Configure s3cmd

You will need to configure s3cmd before you can use it. The tool will read the configuration on ~/.s3cfg

  1. Create the configuration file if it does not exist:

    touch ~/.s3cfg
  2. Edit the file and set up at least the following parameters.

    ECMWF:

    • ECMWF CCI1 endpoint: host_base = object-store.os-api.cci1.ecmwf.int 
    • ECMWF CCI2 endpoint: host_base = object-store.os-api.cci2.ecmwf.int 

EUMETSAT endpoint: host_base = s3.waw3-1.cloudferro.com

  1. Fill in the <youraccesskey> and <yoursecretkey> that will be given to you by the provider

    host_base = <EUM or ECMWF endpoint>
    host_bucket = 
    access_key = <youraccesskey>
    secret_key = <yoursecretkey>
    use_https = True
    
    # Needed for EUMETSAT, as the provider currently supplies a "CloudFerro" SSL certificate. Skip if ECMWF.
    check_ssl_certificate = False

Basic tasks

If you type s3cmd -h you will see the different options of the command, but here are the basics:

List buckets

s3cmd ls

Create a bucket

s3cmd mb s3://yourbucket

List bucket contents

s3cmd ls s3://yourbucket

Get data from bucket

s3cmd get s3://newbucket/file.txt

Put data into bucket

s3cmd put file.txt s3://newbucket/

Remove data from bucket

s3cmd rm s3://newbucket/file.txt

Remove empty bucket

s3cmd rb s3://yourbucket/

Configure automatic expiry of data

s3cmd expire --expiry-days=14 s3://yourbucket/

Information about a bucket

s3cmd info s3://newbucket

Remove automatic expiry policy

s3cmd dellifecycle s3://yourbucket/

Mounting your bucket with S3FS via FUSE

You may also mount your bucket to expose the files in your S3 bucket as if they were on a local disk. Generally S3 cannot offer the same performance or semantics as a local file system, but it can be useful for legacy applications that mainly need to read data and expect the files to be in a conventional file system. You can find more information here.

S3FS installation

First of all, make sure you have S3FS installed in your VM. On CentOS:

sudo yum install epel-release
yum install s3fs-fuse

On Ubuntu:

sudo apt install s3fs

Configure S3FS

You need to store your credentials in a file so S3FS can authenticate with the service. You need to replace <youraccesskey> and <yoursecretkey> by your actual credentials.

echo <youraccesskey>:<yoursecretkey> | sudo tee /root/.passwd-s3fs
sudo chmod 600 /root/.passwd-s3fs

Setting up an automatic mount

Assuming you want to mount your bucket in /mnt/yourbucket, here is what you need to do:

sudo mkdir /mnt/yourbucket
echo "s3fs#yourbucket /mnt/yourbucket fuse _netdev,allow_other,nodev,nosuid,uid=$(id -u),gid=$(id -g),use_path_request_style,url=<s3_endpoint> 0 0" | sudo tee -a /etc/fstab
sudo mount -a

Again, you must replace <s3_endpoint> by the relevant endpoint at ECMWF or EUMETSAT, and you may customise other mount options if you wish to do so. At this point you should have your bucket mounted and ready to use.



5 Comments

  1. If you want to give anonymous read permission to your data : 

    s3cmd setacl s3://yourbucket --acl-public

    Note that the following will give read access to the content of the bucket recursively, but not to the bucket itself (your need to run both to give access) :

    s3cmd setacl s3://yourbucket --acl-public --recursive
  2. You can also place the credential file passwd-s3fs  globally under /etc . I prefer that over the /root/.passwd-s3f  location, because the contents of that file have global impact, especially when you are working with s3fs automounts.

    Also it is very possible to put multiple credentials for multiple buckets in the passwd-s3fs  file. In order to do so, simply add one line per bucket and prefix the <access-key>:<secrect-key> part by <bucket-name>: like in this example:

    /etc/passwd-s3fs
    <yourbucketname>:<youraccesskey>:<yoursecretkey>
    <mybucketname>:<myaccesskey>:<mysecrectkey>

     Note that still the file should be secured by doing

    sudo chmod 600 /etc/passwd-s3fs

    References: http://manpages.ubuntu.com/manpages/xenial/man1/s3fs.1.html, https://github.com/s3fs-fuse/s3fs-fuse/wiki/Fuse-Over-Amazon

  3. Hi Mike Grant

    Quote from above: "July 2021: EUMETSAT's S3 provision has an incomplete SSL certificate; you can either set use_https to False ..."
    Doing that I get redirect errors, such as

    > s3cmd ls
    
    ERROR: S3 error: 302 (Moved Temporarily)

    Also, s3cmd told me to specify use_https = Yes or use_https = No instead of use_https = True or use_https = False. However, that might be because I use a version installed with pip. Sometimes parameters are different between the pip version and a linux repo version.

    When I set use_https = Yes all seems normal

    > s3cmd ls
    2021-05-31 12:22  s3://...
    1. as of today (19 May 2022) for EUMETSAT's S3 provision, to make it working you have to set in the configuration file BOTH the options:

      use_https=True

      check_ssl_certificate = False

      then the mounting is OK.

  4. Note that uploading with s3cmd may produce regular errors, always first failing and then (hopefully) succeeding:

    # uploading data *_boo.tar to S3: s3://my-nice-bucket
    
    upload: '20200701_boo.tar' -> 's3://my-nice-bucket/20200701_boo.tar' [part 1 of 8, 500MB] [1 of 1]
    65536 of 524288000 0% in 1s 46.11 KB/s failed
    ERROR: Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception
    WARNING: Upload failed: /20200701_boo.tar?partNumber=1&uploadId=2~C-Sw6FN_lKQakEnDX8X5HvGLXi5GfGr ([Errno 32] Broken pipe)
    WARNING: Waiting 3 sec...
    upload: '20200701_boo.tar' -> 's3://my-nice-bucket/20200701_boo.tar' [part 1 of 8, 500MB] [1 of 1]
    524288000 of 524288000 100% in 380s 1344.32 KB/s done
    upload: '20200701_boo.tar' -> 's3://my-nice-bucket/20200701_boo.tar' [part 2 of 8, 500MB] [1 of 1]
    65536 of 524288000 0% in 1s 44.55 KB/s failed
    ERROR: Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception
    WARNING: Upload failed: /20200701_boo.tar?partNumber=2&uploadId=2~C-Sw6FN_lKQakEnDX8X5HvGLXi5GfGr ([Errno 32] Broken pipe)
    WARNING: Waiting 3 sec...
    upload: '20200701_boo.tar' -> 's3://my-nice_bucket/20200701_boo.tar' [part 2 of 8, 500MB] [1 of 1]
    524288000 of 524288000 100% in 324s 1575.85 KB/s done
    upload: '20200701_boo.tar' -> 's3://my-nice-bucket/20200701_boo.tar' [part 3 of 8, 500MB] [1 of 1]
    65536 of 524288000 0% in 1s 44.18 KB/s failed
    ERROR: Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception
    WARNING: Upload failed: /20200701_boo.tar?partNumber=3&uploadId=2~C-Sw6FN_lKQakEnDX8X5HvGLXi5GfGr ([Errno 32] Broken pipe)
    WARNING: Waiting 3 sec...
    upload: '20200701_boo.tar' -> 's3://my-nice-bucket/20200701_boo.tar' [part 3 of 8, 500MB] [1 of 1]
    524288000 of 524288000 100% in 418s 1223.19 KB/s done

    ... and so on

    This is described here: https://github.com/s3tools/s3cmd/issues/1114. The latest version of the master branch has a fix for this, but doesn't have a release yet, so repos like pip still have the buggy version.