This page describes the JupyterHub service provided by the ECMWF-Data Store Service (DSS), including how to access the service and what resources and software are available. For documentation regarding navigating your way around a JupyterLab session, please refer to the JupyterLab documentation pages.
Experimental service
IMPORTANT: Please be aware that the Service outlined herein is experimental and is provided on an "AS IS" basis with no warranties of any kind. By using this experimental Service, you acknowledge it may contain bugs, experience downtime, or produce unexpected results, and you assume all risks associated with its use.
We reserve the right to modify or discontinue this Service, or modify the Terms of Use of this service, at any time.
Background
Based on user and developer experience of the Climate Data Store (CDS) Toolbox; the ECMWF-DSS offers a JupyterHub service as online computing environment and earthkit as the supported post-processing and visualisation software. JupyterLab sessions will be available to all DSS users (resource availability dependent) which provide fast access to data available on the various Data Stores and will allow users to perform post-processing and visualisation of this data. The sessions are considered small and not designed for very large computation (see compute resource provisions table below). For larger computation task, users should consider other JupyterHub resources, for example WEkEO.
The JupyterLab sessions provided by ECMWF-DSS include an interactive on-boarding tutorial. For further documentation regarding navigating your way around a JupyterLab session, please refer to the JupyterLab documentation pages.
How to access
The DSS JupyterHub will be available from the ECMWF JupyterHub launcher page. Access requires ECMWF log in credentials (as required by the Data Store web-site to download data), including a two-factor authentication. Two-factor authentication can be setup by updating your ECMWF account credentials.
Once logged in, users are given a choice of environment to use for their Jupyter session from a dropdown menu at the top of the page. The DSS offers a single environment to users: "ECMWF Data Store Service".
To start your session, click the "Start" button, if enough resources are available you will be assign a Jupyter session which will be running for the length of time stated in the resource provisions table below.
Just above the "Start" button there is a "Version" dropdown which refers to the version of the image used to create in your session (i.e. the versioning of the various software packages installed).
To ensure that the software is kept up-to-date, the image will be updated several times a year. Only the "Default" image is officially supported.
The "Rollback" version provides the previous version of the image to assist in cases where a software update has resulted in breaking workflows.
The "Rollback" image is only available for a limited time and we encourage users to update their workflows to use the "Default" image as soon as possible.
Figure1: Starting a DSS session from ECMWF JupyterHub launcher page
Time limited singleton sessions
All JupyterHub sessions running on this service are time limited. When the time is up, the instance will be killed automatically along with any active processing that may be taking place.
You can only have one session running. If you left one running, JupyterHub will connect you straight back into it.
If the service is busy and there are no resources available, you will be informed as such and will need to try again later. In the the early stages of the service we will closely monitor usage, and modify options to ensure that we provide a fair service to all our users.
ECMWF sessions
This is the general ECMWF JupyterHub launcher, therefore it is possible that you have access to more than the Data Store option described here.
Environments available to DSS users
DSS users will be able to spawn sessions with the environment summarised in the table below. This can be selected from the "Select an Environment" dropdown selector on the JupyterHub Launcher. Please note that additional environment options may be added to this list as the service evolves to meet the needs of users.
Name | Use case | RAM | CPUs | Duration |
---|---|---|---|---|
ECMWF Data Store Service | Some small data processing, e.g. data averaging of small files | 4 Gb | 2 | 5 hours |
For reference, a month for one variable in the ERA5 hourly data on single levels is roughly 1.5 Gb. Larger volumes of data could be computed if using block-wise processing of data, e.g. using dask chunks in xarray.
Pre-installed software
Python environments
Default (earthkit) environment
The default python environment is created using conda-forge with a python version version 3.12.8 and the following environment.yml file below. This environment includes a number of ecmwf python packages, including the latest stable release of earthkit. This is the python environment used when launching the earthkit Notebooks and Consoles from the quick start menu.
CDO environment
The CDO software is available in a separate python environment. This can be used when selecting a Notebook or Console from the launcher tab, or if using a bash terminal the CDO environment can be activated with the following:
conda activate cdo
User installation.
You can install additional packages from the (open-source) conda-forge channel (`conda install PACKAGE-NAME`), or from PyPi (`pip install PACKAGE-NAME`). This packages will be installed in your local storage and will be available next time you create a session.
Software support
Software licencing
As specified in article 5.5 of the Terms of Use for the ECMWF Data Store Service JupyterHub, it is the user's responsibility to ensure they have all the necessary rights to use any of services, applications (including software), data and products used on DSS via the DSS JupyterHub.
The software and configuration provided in the initial environment uses open-source channels only (i.e. conda-forge) and we encourage users to use, and contribute to, open source software distributions.
User storage
The DSS will offer two forms of storage for use in the JupyterHub. Please be aware that both of these options, and the way that they have been configured, are subject to change as the project develops
Storage type | Size | Longevity |
---|---|---|
Private storage | 1 Gb | Permanent, if used every 31 days. |
Scratch storage | 100 Gb | Temporary, lifetime depends on overall usage |
Private storage
Each user will have a "home" storage allocation (see table above for size). If you do not use the JupyterHub service for a period of 31 days the private storage will be removed. This storage is only accessible to you.
The DSS service does not provide any back-up for the data stored, therefore we strongly advise that you use git repositories to back up any files stored in the private storage. This could be used recreate any work should your private storage be removed. JupyterHub provides a git plugin which makes it simple to clone your repository.
Scratch storage
Each user will have an allocated quota on the temporary scratch disk (see table above for size). If you exceed the maximum quota, a clean up script will irreversibly remove your oldest files (by modified time). Any attempt to circumvent this behaviour is considered malicious and will lead to your access to JupyterHub being revoked.
The scratch disk is a shared resource and is cleaned regularly. When the shared usage of all users exceeds the maximum quote, the files modified least recently will be removed. This means that files stored here should not be considered permanently stored, they should exist for your current session and may or may not be there when you return. The lifetime of these files will depend on the general usage of the service, and at this stage it is not possible to provide an expected lifetime of such files.
Shared resources
There is a shared resources directory available from the home in the folder titled notebooks-library. This read only directory contain resources provided by Copernicus and ECMWF. If using these notebooks, you can save them to your home directory and edit them as you wish. The notebooks library is managed as a git repository which is cloned each time a session is spawned, for reference the repository is here: https://github.com/ecmwf-projects/dss-jupyterhub-notebooks/
External network access
SSH connections are disabled
The Jupyter sessions do not allow SSH connectivity for security reasons. Therefore, you must use the HTTPS address for any git repositories that you want to clone.
Right to suspend service
This service is provided according the Terms of Use for the ECMWF Data Store Service JupyterHub. We reserve the right to suspend the service to users if we detect that terms and conditions are infringed. Suspension may be triggered automatically, and may only be reinstated when we have investigated the specific use case.