During this quarter we focused on the preparation of software components and technology readiness needed to start deployment the show cases described in the Statement of Work.

In this document we describe the progress and contributions made to different activities in preparation of the deployment of show cases in EWC.

Personnel Involved:

Victoria Cherkas , Christian Kanesan, Stefan Friedli, Carlos Osuna (MeteoSwiss) contributed in-kind to all the tasks described below.

The hiring of personnel with the resources of the project was completed and Nina Burgdorfer will start contract on 15.03.2024

Data Technologies Workshop, ECMWF - MeteoSwiss, 6-10.11.2023

In order to kick-off the collaboration around data technologies, a workshop was organized between MeteoSwiss and ECMWF at the ECMWF center (Reading).

Participation: Christian Kanesan (MeteoSwiss), Victoria Cherkas (MeteoSwiss), Petra Baumann (MeteoSwiss), Carlos Osuna (MeteoSwiss), Stefan Friedli (MeteoSwiss - remote), Milos Belic (MeteoSwiss - remote), Emanuele Danovaro (ECMWF), Christopher Bradley (ECMWF), Simon Smart (ECMWF), James Hawkes (ECMWF), Tiago Quintino (ECMWF), Sandor Kertesz (ECMWF)

The goals were:

get-to-know event for the team of developers.
define goals of the collaboration and exchange of development plans.
understanding of technology, use cases and define roadmap for the collaboration.

The topics discussed during the workshop covered among others:

Data processing frameworks in python:
- numpy/xarray: should we embrace rich functionality of frameworks like xarray or use pure numpy API solutions for compatibility?
- how to write data back to grib and other data formats.
- Metadata and grid information in the framework.
- Evaluate and learn from typical member state operators and grids for data processing.
- Dask and task scheduling: How to parallelize the dags.
- Performance evaluations: dask, numba, GPU.
- Event driven processing.
FDB:
- How to deploy a FDB server in order to deploy data out of the HPC center.
- Learn from operational aspects of deployments of FDB at ECMWF.
- Analysis of FDB performance for the MeteoSwiss data deployment
polytope:
- Design of the data bridge using polytope serving FDB data.
MARS language: for COSMO/ICON data of the MeteoSwiss operational data.

Conclusions and results of the different activities are summarized in the following sections of the quarterly report.

FDB

FDB is an essential component for the pilot project. FDB is a domain-specific object store developed at ECMWF for storing, indexing and retrieving GRIB data.

It will be used in the pilot project in order to retrieve and access data semantically instead of the traditional (grib) file based approach, employed still in many operational environments of NMHS.

FDB implements a field database optimized for HPC data centers with a (lustre) distributed file system and adds a Python frontend to facilitate the data access of meteorological fields in Python.

The following shows an example of how to retrieve a full hypercube of ensemble data for two fields (height and DBZ) from COSMO data:

request = mars.Request(
        ("HHL", "DBZ"),
        date=ref_time.strftime("%Y%m%d"),
        time=ref_time.strftime("%H00"),
        expver="0001",
        levelist=tuple(range(1, 82)),
        number=tuple(range(11)),
        step=lead_time,
        levtype=mars.LevType.MODEL_LEVEL,
        model=mars.Model.COSMO_1E,
        stream=mars.Stream.ENS_FORECAST,
        type=mars.Type.ENS_MEMBER,
    )
ds = model_data.get(request, ref_param_for_grid="HHL")

During this quarter, the project deployed an instance of FDB at CSCS for COSMO/ICON data. The operational NWP COSMO/ICON data is directly pushed into the FDB instance so that applications and post-processing (mostly in Python) can access data semantically as with the example shown above. This work required adaptations and extensions of the MARS language used to index semantically the grib data in order to support the COSMO/ICON operational data schemes. It provided valuable insights on how a NMHS must modify the MARS language to be able to use all the FDB based data technologies on its own operational data.

Remote FDB

As described above, FDB is a field database based on a distributed (e.g. Lustre) file system. However, in various use cases we would like to access the FDB data from a network that does not have access to the operational filesystem. Therefore, an important component of the FDB framework is the FDB remote. This new development of the FDB software stack allows to serve FDB data to requests that do not have access to the Lustre based deployment of FDB.

The FDB remote is implemented as a set of services, including a catalogue (for querying data) and a store (that serves data requests).

A typical use case that needs the deployment of the FDB remote is the access of ECMWF production data from the EWC environment.

One of the architecture designs of pilot project is based on access to ECMWF production directly from EWC (without the need to setup and maintain dissemination streams). However, EWC is not in the same network as the HPC production of ECMWF. Applications running on EWC can not access the main FDB of the HPC center for retrieving data. FDB remote will be setup, such that any application in EWC will be able to request data to the server.

A simplified design of this deployment is shown in the following diagram:

In the pilot project, we have deployed a similar architecture with a FDB instance at MeteoSwiss (in the MeteoSwiss operational dedicated HPC system at CSCS) in order to test this new server technology, together with a polytope deployment.

In this case, we are serving data (COSMO/ICON) generated daily by the MeteoSwiss operational suite.

This is helping ECMWF consolidate the new developments of the FDB remote and debug some open issues. The most recent developments will be soon released (once the new functionality is stable). At the moment, we are using the following branch of FDB.

During the development phase regular meetings with the FDB developers were scheduled in order to work together, identify issues of instability and define actions to fix/improve the server. Out of this work we expect a stable version that can be used for the pilot project.

FDB performance

In order to evaluate the technology, it was important to characterize the performance obtained while retrieving data from FDB, using typical data access patterns of the downstream or post-processing applications of a NHMS.

We installed a complete FDB instance (in the CSCS lustre filesystem) at MeteoSwiss that holds the daily operational generated data. Based on that instance, we performance a series of benchmarking experiments.

Results depend on the access pattern (contiguity of data in a single data request, size of request, data being cache in the filesystem of the data servers, etc), but overall we obtained a high and satisfactory throughtput.

Results were summarized in the following notebook:

Jupyter Viewer for Confluence: Allowlist restrictions

External host of Notebook URL blocked due to Confluence Allowlist restrictions. Add allowance for this host in the Confluence allowlist settings.

Polytope

Polytope is the high-level service of the FDB technology that provides semantic access to the FDB data through HTTP RESTFUL API requests. It is an important component to give access to data out of the network and other HTTP services.

Typically, this is required in a NHMS operational environment on post-processing framework that run using services in a cloud/container platform without access to the operational filesystem/network.

In this quarter we successfully deployed Polytope for the FDB that holds COSMO/ICON data at MeteoSwiss. The design considers a cross-platform deployment between CSCS (that host FDB) and AWS in order to leverage the use of AWS managed services.
The design is shown in the figure below. A similar design will be deployed for the ECMWF production, in order to serve IFS production data to the use cases of the pilot project running in EWC.

Gribjump Server

The gribjump server (part of FDB/polytope) allows to serve data for only grid points or regions, instead of entire hypercubes. As part of the FDB ecosystem of servers, polytope will accepts requests to grid point data, timeseries, trajectories, etc. that might look the following example for a timeseries extraction.

gribjump

"request": {
        "class": "od",
        "stream": "enfo",
        "type": "ememb",
        "levtype": "sfc",
        "date": "20240222",
        "time": "0300",
        "expver": "0001",
        "model": "ICON-CH1-EPS",
        "param": "500011",
        "number": "1",
        "feature": {
            "type": "timeseries",
            "points": [
                [
                    47.45268,
                    8.56523
                ]
            ],
            "start": 0,
            "end": 300
        }
    },
    "verb": "retrieve"

This functionality of polytope will be crucial since typically more than 50% of the requests of model data are a small selection, around a region, station or trajectory. This functionality then allows to efficiently extract this data from FDB without having to transfer entire hypercubes to the client.

The development of the grib jump is in an early state, however we could successfully prototype and test it for the COSMO/ICON data accessing a single grid point. At the moment it supports only geographical (global and regional) lat lon coordinates.

Future developments will extend it to other grids, like rotated lat-lon, or unstructured meshes like the ICON grid. We will profile and benchmark this type of data accesses in the following months.

Data Processing Framework in Python

The last component of technology we need in the pilot project is a data processing framework in Python that can:

load data from FDB/polytope into xarray datasets.
provide a large set of meteorological operators, interpolations, reprojections, etc.
write data in grib.
push data into FDB (local and remote).

MeteoSwiss is developing a data processing framework in Python that supports already a rich set of functionalities from the categories list above:

https://github.com/MeteoSwiss-APN/icon_data_processing_incubator/tree/main

The framework uses the earthkit (https://github.com/ecmwf/earthkit-data) as a framework for memory-only post-processing pipelines. Earthkit has seen substantial developments in the past months and is responsible for loading the grib objects into memory in Python (numpy compatible). It allows to efficiently process all the grib information in memory while retaining the original grib representation, therefore allowing to later write back the output as grib.

In the context of the pilot project we are adapting it to make use of the FDB/polytope data technologies. This framework will be used to implement and deploy the use cases proposed in the pilot project, accessing from EWC the IFS operational data.

During this quarter we added functionality to the framework to:

access data from FDB and/or polytope
write results of the data processing back into grib and FDB/polytope
implement application to re-project rotated lat-lon into geographical lat-lon (since it is the only grid supported at the moment by grib-jump)

Below we show an example of use of the framework that reads data from polytope:

idpi

def compute_echo_top(ref_time: dt.datetime, lead_time: int):
    request = mars.Request(
        ("HHL", "DBZ"),
        date=ref_time.strftime("%Y%m%d"),
        time=ref_time.strftime("%H00"),
        expver="0001",
        levelist=tuple(range(1, 82)),
        number=tuple(range(11)),
        step=lead_time,
        levtype=mars.LevType.MODEL_LEVEL,
        model=mars.Model.ICON_CH1_EPS,
        stream=mars.Stream.ENS_FORECAST,
        type=mars.Type.ENS_MEMBER,
    )
    ds = mch_model_data.get(request, ref_param_for_grid="HHL")

    client = get_client()

    # Calculate ECHOTOPinM
    hfl = destagger(ds["HHL"], "z")
    echo_top = interpolate_k2any(hfl, "high_fold", ds["DBZ"], [15.0], hfl)

    echo_top.attrs |= metadata.override(echo_top.message, shortName="ECHOTOPinM")

    with data_source.cosmo_grib_defs():
        client.to_fdb(echo_top)

These examples of use will see some consolidation work and will be released in the coming months.

Future work

The plan for the next quarter are:

Further development and consolidation of the technologies and components already mentioned in this quarter.
Start deploying first examples of the Flexpart showcase in EWC, using some of these components.
Organize a GPU workshop with various key presentations from the community. The venue will be used to discuss and show the progress and efforts to port NWP and climate models and post-processing to GPUs. It will give an overview of the various approaches and technologies used.
Continue the discussion around how to integrate GPU computing in the post-processing toolchain.
Collect more use cases from members of the pilot project.

Page tree

Q1 - 2024