Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page describes the JupyterHub service provided by the ECMWF-Data Store Service (DSS), including how to access the service and what resources and software are available. For documentation regarding navigating your way around a JupyterLab session, please refer to the JupyterLab documentation pages.

Warning
titleExperimental service

Please be aware that the service described here is subject to change based on real usage of the system.

Note
Please note that by launching a JupyterHub session you are agreeing the terms and conditions of use Terms of Use for the ECMWF Data Store Service JupyterHub.
Info
iconfalse
titleTable of Contents
Table of Contents
maxLevel4
Alert
titleECMWF-DSS
typeInfo

The ECMWF Data Store Service (ECMWF-DSS) refers to the family of common architecture Data Stores that serve data for the Copernicus Climate Change Service (C3S; via the CDS), Copernicus Atmosphere Monitoring Service (CAMS, via the ADS) and Copernicus Emergency Management Service (CEMS; via the EWDS).

Background

Based on user and developer experience of the Climate Data Store (CDS) Toolbox; the ECMWF-DSS offers a JupyterHub service as online computing environment and earthkit as the supported post-processing and visualisation software. JupyterLab sessions will be available to all DSS users (resource availability dependent) which provide fast access to data available on the various Data Stores and will allow users to perform post-processing and visualisation of this data. The sessions are considered small and not designed for very large computation (see compute resource provisions table below). For larger computation task, users should consider other JupyterHub resources, for example WEkEO.

The JupyterLab sessions provided by ECMWF-DSS include an interactive on-boarding tutorial. For further documentation regarding navigating your way around a JupyterLab session, please refer to the JupyterLab documentation pages.

How to access

The DSS JupyterHub will be available from the ECMWF JupyterHub launcher page. Access requires ECMWF log in credentials (as required by the Data Store web-site to download data), including a two-factor authentication. Two-factor authentication can be setup by updating your ECMWF account credentials.

Once logged in, users are given a choice of environment to use for their Jupyter session from a dropdown menu at the top of the page. The DSS offers a single environment to users: "ECMWF Data Store Service".

To start your session, click the "Start" button, if enough resources are available you will be assign a Jupyter session which will be running for the length of time stated in the resource provisions table below.

Just above the "Start" button there is a "Version" dropdown which refers to the version of the image used to create in your session (i.e. the versioning of the various software packages installed).

To ensure that the software is kept up-to-date, the image will be updated several times a year. Only the "Default" image is officially supported.

The "Rollback" version provides the previous version of the image to assist in cases where a software update has resulted in breaking workflows.

The "Rollback" image is only available for a limited time and we encourage users to update their workflows to use the "Default" image as soon as possible.


Figure1: Starting a DSS session from ECMWF JupyterHub launcher page

Note
titleTime limited singleton sessions

All JupyterHub sessions running on this service are time limited. When the time is up, the instance will be killed automatically along with any active processing that may be taking place.

You can only have one session running. If you left one running, JupyterHub will connect you straight back into it.

If the service is busy and there are no resources available, you will be informed as such and will need to try again later. In the the early stages of the service we will closely monitor usage, and modify options to ensure that we provide a fair service to all our users.

Info
titleECMWF sessions

This is the general ECMWF JupyterHub launcher, therefore it is possible that you have access to more than the Data Store option described here.

Environments available to DSS users

DSS users will be able to spawn sessions with the environment summarised in the table below. This can be selected from the "Select an Environment" dropdown selector on the JupyterHub Launcher. Please note that additional environment options may be added to this list as the service evolves to meet the needs of users.

Name

Use case

RAM

CPUs

Duration

ECMWF Data Store Service

Some small data processing, e.g. data averaging of small files

4 Gb

2

5 hours

For reference, a month for one variable in the ERA5 hourly data on single levels is roughly 1.5 Gb. Larger volumes of data could be computed if using block-wise processing of data, e.g. using dask chunks in xarray.

Pre-installed software

Python environments

Default (earthkit) environment

The default python environment is created using conda-forge with a python version version 3.12.8 and the following environment.yml file below. This environment includes a number of ecmwf python packages, including the latest stable release of earthkit. This is the python environment used when launching the earthkit Notebooks and Consoles from the quick start menu.

Expand
titleconda-forge environment.yml
Code Block
languageyml
titleenvironment.yml
name: base
channels:
  - conda-forge
dependencies:
  - python
  - pip
  - jupyterlab-git
  - ipywidgets
  - ipykernel
  - nodejs
  - git
  - yaml
  - pyyaml
  - beautifulsoup4
  - jupyter-server-proxy
  - numpy
  - pandas
  - xarray
  - numexpr
  - scipy
  - seaborn
  - dask
  - cartopy
  - shapely
  - plotly
  - netcdf4
  - cf-units
  - Markdown
  - toolz
  - tqdm
  - adjustText
  - aws-requests-auth
  - bokeh
  - voila
  - docstring_parser
  - filelock
  - metview-batch
  - metview-python
  - cdsapi
  - ecmwf-api-client
  - ecmwf-opendata
  - zarr
  - jupyterlab-tour
  - pip:
    - earthkit


CDO environment

The CDO software is available in a separate python environment. This can be used when selecting a Notebook or Console from the launcher tab, or if using a bash terminal the CDO environment can be activated with the following:

Code Block
languagebash
titleactivate cdo
conda activate cdo

User installation.

You can install additional packages from the (open-source) conda-forge channel (`conda install PACKAGE-NAME`), or from PyPi (`pip install PACKAGE-NAME`). This packages will be installed in your local storage and will be available next time you create a session.

Note
titleSoftware support
Given the complexities in environment management and software versioning, we only provide support for the default earthkit environment.
Note
titleSoftware licencing

As specified in article 5.5 of the Terms of Use for the ECMWF Data Store Service JupyterHub, it is the user's responsibility to ensure they have all the necessary rights to use any of services, applications (including software), data and products used on DSS via the DSS JupyterHub.

The software and configuration provided in the initial environment uses open-source channels only (i.e. conda-forge) and we encourage users to use, and contribute to, open source software distributions.

User storage

The DSS will offer two forms of storage for use in the JupyterHub. Please be aware that both of these options, and the way that they have been configured, are subject to change as the project develops

Storage typeSizeLongevity
Private storage1 GbPermanent, if used every 31 days.
Scratch storage100 GbTemporary, lifetime depends on overall usage

Private storage

Each user will have a "home" storage allocation (see table above for size). If you do not use the JupyterHub service for a period of 31 days the private storage will be removed. This storage is only accessible to you.

The DSS service does not provide any back-up for the data stored, therefore we strongly advise that you use git repositories to back up any files stored in the private storage. This could be used recreate any work should your private storage be removed. JupyterHub provides a git plugin which makes it simple to clone your repository.

Expand
titleHow to clone a git repository from JupyterHub

Scratch storage

Each user will have an allocated quota on the temporary scratch disk (see table above for size). If you exceed the maximum quota, a clean up script will irreversibly remove your oldest files (by modified time). Any attempt to circumvent this behaviour is considered malicious and will lead to your access to JupyterHub being revoked.

The scratch disk is a shared resource and is cleaned regularly. When the shared usage of all users exceeds the maximum quote, the files modified least recently will be removed. This means that files stored here should not be considered permanently stored, they should exist for your current session and may or may not be there when you return. The lifetime of these files will depend on the general usage of the service, and at this stage it is not possible to provide an expected lifetime of such files.

Shared resources

There is a shared resources directory available from the home in the folder titled notebooks-library. This read only directory contain resources provided by Copernicus and ECMWF. If using these notebooks, you can save them to your home directory and edit them as you wish. The notebooks library is managed as a git repository which is cloned each time a session is spawned, for reference the repository is here: https://github.com/ecmwf-projects/dss-jupyterhub-notebooks/

External network access

SSH connections are disabled

The Jupyter sessions do not allow SSH connectivity for security reasons. Therefore, you must use the HTTPS address for any git repositories that you want to clone.

Right to suspend service

This service is provided according the Terms of Use for the ECMWF Data Store Service JupyterHub. We reserve the right to suspend the service to users if we detect that terms and conditions are infringed. Suspension may be triggered automatically, and may only be reinstated when we have investigated the specific use case.