Experimental service
This is an experimental service currently available to ECMWF staff only.
Please be aware that the service described here is subject to change based on real usage of the system.
Terms and conditions
Background
The ECMWF Data Store Service (ECMWF-DSS) refers to the family of common architecture Data Stores that serve data for the Copernicus Climate Change Service (C3S; via the CDS), Copernicus Atmosphere Monitoring Service (CAMS, via the ADS) and Copernicus Emergency Management Service (CEMS; via the EWDS).
Based on user and developer experience of the Climate Data Store (CDS) Toolbox; the DSS offers a JupyterHub service as online computing environment and earthkit as the supported post-processing and visualisation software. Jupyterhub sessions will be available to all DSS users (resource availability dependent) which provide fast access to data available on the various Data Stores and will allow users to perform post-processing and visualisation of this data. The sessions are considered small and not designed for very large computation (see compute resource provisions table below). For larger computation task, users should consider other JupyterHub resources, for example WEkEO.
Time limited singleton sessions
All JupyterHub sessions running on this service are time limited. When the time is up, the instance will be killed automatically along with any active processing that may be taking place.
You can only have one session running. If you left one running, JupyterHub will connect you straight back into it.
How to access
The DSS JupyterHub will be available from the ECMWF JupyterHub launcher page, linked from the Data Store web-sites. Access requires ECMWF log in credentials (as required by the Data Store web-site to download data), including a two-factor authentication.
Once logged in, users are given a choice of environment to use for their Jupyter session session from a dropdown menu, with several additional option depending on which environment you have selected. Please note that by launching a ECMWF-DSS JupyterHub session you are agreeing the terms and conditions of use Terms of Use for the ECMWF Data Store Service JupyterHub.
ECMWF sessions
This is the general ECMWF JupyterHub launcher, therefore it is possible that you have access to more than the Data Store option described here
Environments available to DSS users
DSS users will be able to spawn sessions with the environment summarised in the table below. This can be selected from the "Select an Environment" dropdown selector on the JupyterHub Launcher. Please note that additional environment options may be added as the service evolves to meet the needs of users.
Name | Use case | RAM | CPU | Duration |
---|---|---|---|---|
ECMWF Data Store Service | Some small data processing, e.g. data averaging of small files | 4 | 2 | 5 hours |
Session priorities
Small sessions will be prioritised to ensure fair usage of the platform. These priorities are to be monitored closely and will evolve as the project develops.
Pre-installed software
Python environment
The default python environment created using conda-forge with a python version versoin 3.11.10 and the following environment.yml file:
You can install additional packages from the (open-source) conda-forge channel (`conda install PACKAGE-NAME`), or from PyPi (`pip install PACKAGE-NAME`). This packages will be installed in your local storage and will be available next time you create a session.
User storage
The DSS will offer two forms of storage for use in the JupyterHub. Please be aware that both of these options, and the way that they have been configured, are subject to change as the project develops
Storage type | Size | Longevity |
---|---|---|
Private storage | 1 Gb | Permanent, if used every 31 days. |
Scratch storage | 100 Gb | Temporary, lifetime depends on overall usage |
Private storage
Each user will have a "home" storage allocation (see table above for size). If you do not use the JupyterHub service for a period of 31 days the private storage will be removed. This storage is only accessible to you.
We strongly advise that you use git repositories to back up any files stored in the private storage such that you can recreate any work should your private storage be removed. JupyterHub provides a git plugin which makes it simple to clone your repository.
Scratch storage
Each user will have an allocated quota on the temporary scratch disk (see table above for size). If you exceed the maximum quota, a clean up script will remove your largest files. Any attempt to cirumvent this behaviour is considered malicious and will lead to your access to JupyterHub being revoked.
The scratch disk is a shared resource and is cleaned regularly. When the shared usage of all users exceeds the maximum quote, the files modified least recently will be removed. This means that files stored here should not be considered permanently stored, they should exist for your current session and may be there when you return. The lifetime of these files will depend on the general usage of the service, and at this stage it is not possible to provide an expected lifetime of such files.
Shared resources
There is a shared resources directory available from the home in the folder titled notebooks-library. This read only directory contain resources provided by Copernicus and ECMWF. If using these notebooks, you can save them to your home directory and edit them as you wish. The notebooks library is managed as a git repository which is cloned each time a session is spawned, for reference the repository is here: https://github.com/ecmwf-projects/dss-jupyterhub-notebooks/
External network access
SSH connections are disabled
The Jupyter sessions do not allow SSH connectivity for security reasons. Therefore, you must use the HTTPS address for any git repositories that you want to clone.
Right to suspend service
This service is provided according the Terms of Use for the ECMWF Data Store Service JupyterHub. We reserve the right to suspend the service to users if we detect that terms and conditions are infringed. Suspension may be triggered automatically, and may only be reinstated when we have investigated the specific use case.