This page describes the JupyterHub service provided by the ECMWF-Data Store Service (DSS),  hereinafter referred to as DSS JupyterHub Service or Service, including how to access the Service and what resources and software are available. For documentation regarding navigating your way around a JupyterLab session, please refer to the JupyterLab documentation pages.

Experimental service

IMPORTANT: Please be aware that the Service outlined herein is experimental and is provided on an "AS IS" basis with no warranties of any kind. By using this experimental Service, you acknowledge it may contain bugs, experience downtime, or produce unexpected results, and you assume all risks associated with its use.

We reserve the right to modify or discontinue this Service, or modify the Terms of Use of this service, at any time.

Please note that by launching a JupyterHub session you are agreeing to the Terms of Use for the ECMWF Data Store Service JupyterHub.

Table of Contents

Background

Based on user and developer experience of the Climate Data Store (CDS) Toolbox; the DSS offers the DSS JupyterHub Service as online computing environment and earthkit as the supported post-processing and visualisation software. JupyterLab sessions will be available to all DSS users (resource availability dependent) which provide fast access to data available on the various Data Stores and will allow users to perform post-processing and visualisation of this data. The sessions are considered small and not designed for large computation (see compute resource provisions table below). For larger computation task, DSS users should consider other JupyterHub resources, for example WEkEO.

The JupyterLab sessions provided by DSS include an interactive on-boarding tutorial. For further documentation regarding navigating your way around a JupyterLab session, please refer to the JupyterLab documentation pages.

How to access

The DSS JupyterHub Service is available from the ECMWF JupyterHub launcher page. Access requires ECMWF log in credentials (as required by the Data Store web-site to download data), including a two-factor authentication. Two-factor authentication can be setup by updating your ECMWF account credentials.

Once logged in, DSS users are given a choice of environment to use for their DSS JupyterLab session from a dropdown menu at the top of the page. The DSS offers a single environment to users: "ECMWF Data Store Service".

To start your session, click the "Start" button, if enough resources are available you will be assign a DSS JupyterLab session which will be running for the length of time stated in the resource provisions table below.

Just above the "Start" button there is a "Version" dropdown which refers to the version of the image used to create in your session (i.e. the versioning of the various software packages installed).

To ensure that the software is kept up-to-date, the image will be updated several times a year. Only the "Default" image is supported.

The "Rollback" version provides the previous version of the image to assist in cases where a software update has resulted in breaking workflows.

The "Rollback" image is only available for a limited time and we encourage DSS users to update their workflows to use the "Default" image as soon as possible.


Figure1: Starting a DSS session from ECMWF JupyterHub launcher page

Time limited singleton sessions

All DSS JupyterLab sessions running on this service are time limited. When the time is up, the instance will be terminated automatically along with any active processing that may be taking place.

You can only have one session running. If you left one running, DSS JupyterHub will connect you straight back into it.

If the Service is busy and there are no resources available, you will be informed as such and will need to try again later. To ensure fair usage of DSS JupyterHub Service and/or the respective data store, ECMWF reserves the right to prioritise smaller (and shorter) sessions before others. 

ECMWF sessions

This is the general ECMWF JupyterHub launcher, therefore it is possible that you have access to more than the Data Store Service option described here.

Environments available to DSS users

DSS users will be able to spawn sessions with the environment summarised in the table below. This can be selected from the "Select an Environment" dropdown selector on the JupyterHub Launcher. Please note that additional environment options may be added to this list as the Service evolves to meet the needs of users.

Name

Use case

RAM

CPUs

Duration

ECMWF Data Store Service

Some small data processing, e.g. data averaging of small files

4 Gb

2

5 hours

For reference, a month for one variable in the ERA5 hourly data on single levels is roughly 1.5 Gb. Larger volumes of data could be computed if using block-wise processing of data, e.g. using dask chunks in xarray.

Pre-installed software

Python environments

Default (earthkit) environment

The default Python environment is created using conda-forge with a python version 3.12.8 and the following environment.yml file below. This environment includes a number of ecmwf python packages, including the latest stable release of earthkit. This is the python environment used when launching the earthkit Notebooks and Consoles from the quick start menu.

environment.yml
name: base
channels:
  - conda-forge
dependencies:
  - python
  - pip
  - jupyterlab-git
  - ipywidgets
  - ipykernel
  - nodejs
  - git
  - yaml
  - pyyaml
  - beautifulsoup4
  - jupyter-server-proxy
  - numpy
  - pandas
  - xarray
  - numexpr
  - scipy
  - seaborn
  - dask
  - cartopy
  - shapely
  - plotly
  - netcdf4
  - cf-units
  - Markdown
  - toolz
  - tqdm
  - adjustText
  - aws-requests-auth
  - bokeh
  - voila
  - docstring_parser
  - filelock
  - metview-batch
  - metview-python
  - cdsapi
  - ecmwf-api-client
  - ecmwf-opendata
  - zarr
  - jupyterlab-tour
  - pip:
    - earthkit


CDO environment

The CDO software is available in a separate python environment. This can be used when selecting a Notebook or Console from the launcher tab, or if using a bash terminal the CDO environment can be activated with the following:

activate cdo
conda activate cdo

User installation.

You can install additional packages from the (open-source) conda-forge channel (`conda install PACKAGE-NAME`), or from PyPi (`pip install PACKAGE-NAME`). This packages will be installed in your local storage and will be available next time you create a session.

Software support

Given the complexities in environment management and software versioning, we only provide support for the default earthkit environment.

Software licencing

As specified in Article 5.5 of the Terms of Use for ECMWF's Data Store Services JupyterHub, it is the DSS user's responsibility to ensure they have all the necessary rights to use any of services, applications (including software), data and products used on DSS via the DSS JupyterHub Service.

The software and configuration provided in the initial environment uses open-source channels only (i.e. conda-forge) and we encourage DSS users to use, and contribute to, open source software distributions.

User storage

The DSS JupyterHub Service will offer two forms of storage. Please be aware that both of these options, and the way that they have been configured, are subject to change.

Storage typeSizeLongevity
Private Storage1 GbPermanent, if used every 31 days.
Scratch Storage100 GbTemporary, lifetime depends on overall usage

Private storage

Each DSS user will have a "home" Private Storage allocation (see table above for size). If you do not use the DSS JupyterHub Service for a period of 31 consecutive days the Private Storage will be removed. This storage is only accessible to you.

The DSS JupyterHub Service does not provide any back-up for the data stored, therefore we strongly advise that you use git repositories to back up any files stored in the private storage. This could be used to recreate any work should your Private Storage be deleted. JupyterLab provides a git plugin which makes it simple to clone your repository.

Scratch Storage

Each DSS user will have an allocated quota on the temporary scratch disk, i.e Scratch Storage (see table above for size). If you exceed the maximum quota, a clean up script will irreversibly remove your oldest files (by modified time). Should the DSS user circumvent the quota, in addition to any other rights available, we reserve the right to delete any files stored in the Scratch Storage of the DSS user.

The Scratch Storage is mounted on a scratch disk, which is a shared resource and is cleaned regularly to ensure that the disk does not exceed capacity. Therefore, the lifetime of the storage of files in the Scratch Storage depends on the overall usage of the DSS JupyterHub Service by all DSS users. The cleanup of files is based on the least recently modified files. This means that files stored here should not be considered permanently stored, they should exist for your current session and may or may not be there when you return. 

Shared resources (notebooks-library)

There is a shared resources directory available from the home in the folder titled notebooks-library/ . This ‘read only’ directory contains notebooks produced by ECMWF for working with the data available via the Data Stores Service and other ECWMF data portals, for a summary description of the resources available please expand the text below. If you wish to save any changes to these notebooks, you will need to make a copy of the notebook in your home directory, this can be done via the "save as..." option in the "file" menu in the top left of the JupyterLab interface, or from the prompt if you try to close the notebook.

dss-jupyterhub-notebooks

Please explore the various libraries of Jupyter Notebooks produced by ECMWF in the context of the Copernicus Climate Change Service (C3S), the Copernicus Atmospheric Monitoring Service (CAMS) and the Copernicus Emergency Management Service (CEMS).

[!WARNING] Please note that these notebooks are stored in a read-only directory, and should you wish to save any changes you will have to do so in your personal space.

dss-notebooks

The dss-notebooks directory contains technical notebooks related to the data available in the Data Store Services (DSS). This library includes two sub-directories:

datasets

Notebooks which demonstrate how to work with the datasets. The folders are named after the slug for the datasets available in the DSS, and include basic examples such as how to create the overview images used on the catalogue entries.

documentation

Notebooks which document, with example code, several aspects of the data processing that occurs in the backend of the DSS. For example the grib to netCDF conversion and the the calculation of daily statistics.

c3s-training

The C3S training material is also published as a JupyterBook. Here we provide direct access to the notebooks so that they can be executed directly. The training material is organised in sub-modules which group the examples into data themes, the sub-modules included are:

  1. climate-indices - Climate indicators and indexes
  2. projections - Climate projections (CMIP and CORDEX)
  3. reanalysis - The ECMWF Reanalysis (ERA) family
  4. sat-obs-atmos-comp - Satelliite observations of atmospheric composition
  5. sat-obs-atmos-physics - Satelliite observations of atmospheric physics
  6. sat-obs-hydro-cryo - Satelliite observations of hydrology and cryosphere
  7. sat-obs-land - Satelliite observations of land
  8. sat-obs-ocean - Satelliite observations of ocean
  9. seasonal-forecast - The ECMWF Seasonal Forecast family

cams-training

The CAMS training material is also published as a JupyterBook. Here we provide direct access to the notebooks so that they can be executed directly.

sketchbook-earth

The Sketchbook Earth provides a series of Jupyter books which document the metrics calculated for the Climate Intelligence Reports produced by C3S.



External network access

SSH connections are disabled

The DSS JupyterHub Service sessions do not allow SSH connectivity for security reasons. Therefore, you must use the HTTPS address for any git repositories that you want to clone.

Right to suspend service

This Service is provided according the Terms of Use for the ECMWF Data Store Service JupyterHub. We reserve the right at any time and our sole discretion with or without prior notice to suspend the Service to DSS users or terminate any DSS user’s access to the DSS JupyterHub Service, particularly in cases of violation of Terms of Use for the ECMWF Data Store Service JupyterHub or any of the applicable license terms or if a Private Storage is not accessed (i.e., by spawning a session) for a certain period of time beyond thirty-one (31) consecutive days. 

  • No labels