Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

During this quarter we focused on the preparation of software components and technology readiness needed to start deployment the show cases described in the Statement of Work.

In this document we describe the progress and contributions made to different activities in preparation of the deployment of show cases in EWC.

...

In this quarter we successfully deployed Polytope for the FDB that holds COSMO/ICON data at MeteoSwiss. The design considers a cross-platform deployment between CSCS (that host FDB) and AWS in order to leverage the use of AWS managed services. 
The design is shown in the figure below. A similar design will be deployed for the ECMWF production, in order to serve IFS production data to the use cases of the pilot project running in EWC. 



Gribjump Server


The gribjump server (part of FDB/polytope) allows to serve data for only grid points or regions, instead of entire hypercubes. As part of the FDB ecosystem of servers, polytope will accepts requests to grid point data, timeseries, trajectories, etc. that might look the following example for a timeseries extraction. 

Code Block
languagepy
titlegribjump
"request": {
        "class": "od",
        "stream": "enfo",
        "type": "ememb",
        "levtype": "sfc",
        "date": "20240222",
        "time": "0300",
        "expver": "0001",
        "model": "ICON-CH1-EPS",
        "param": "500011",
        "number": "1",
        "feature": {
            "type": "timeseries",
            "points": [
                [
                    47.45268,
                    8.56523
                ]
            ],
            "start": 0,
            "end": 300
        }
    },
    "verb": "retrieve"


This functionality of polytope will be crucial since typically more than 50% of the requests of model data are a small selection, around a region, station or trajectory. This functionality then allows to efficiently extract this data from FDB without having to transfer entire hypercubes to the client. 

The development of the grib jump is in an early state, however we could successfully prototype and test it for the COSMO/ICON data accessing a single grid point. At the moment it supports only geographical (global and regional) lat lon coordinates. 

Future developments will extend it to other grids, like rotated lat-lon, or unstructured meshes like the ICON grid. We will profile and benchmark this type of data accesses in the following months. 


Data Processing Framework in Python


The last component of technology we need in the pilot project is a data processing framework in Python that can: 

  • load data from FDB/polytope into xarray datasets.
  • provide a large set of meteorological operators, interpolations, reprojections, etc. 
  • write data in grib.
  • push data into FDB (local and remote).

MeteoSwiss is developing a data processing framework in Python that supports already a rich set of functionalities from the categories list above: 

https://github.com/MeteoSwiss-APN/icon_data_processing_incubator/tree/main

 The framework uses the earthkit (https://github.com/ecmwf/earthkit-data) as a framework for memory-only post-processing pipelines. Earthkit has seen substantial developments in the past months and is responsible for loading the grib objects into memory in Python (numpy compatible). It allows to efficiently process all the grib information in memory while retaining the original grib representation, therefore allowing to later write back the output as grib. 

In the context of the pilot project we are adapting it to make use of the FDB/polytope data technologies. This framework will be used to implement and deploy the use cases proposed in the pilot project, accessing from EWC the IFS operational data. 

During this quarter we added functionality to the framework to: 

  • access data from FDB and/or polytope
  • write results of the data processing back into grib and FDB/polytope
  • implement application to re-project rotated lat-lon into geographical lat-lon (since it is the only grid supported at the moment by grib-jump) 

Below we show an example of use of the framework that reads data from polytope:


Code Block
languagepy
titleidpi
def compute_echo_top(ref_time: dt.datetime, lead_time: int):
    request = mars.Request(
        ("HHL", "DBZ"),
        date=ref_time.strftime("%Y%m%d"),
        time=ref_time.strftime("%H00"),
        expver="0001",
        levelist=tuple(range(1, 82)),
        number=tuple(range(11)),
        step=lead_time,
        levtype=mars.LevType.MODEL_LEVEL,
        model=mars.Model.ICON_CH1_EPS,
        stream=mars.Stream.ENS_FORECAST,
        type=mars.Type.ENS_MEMBER,
    )
    ds = mch_model_data.get(request, ref_param_for_grid="HHL")

    client = get_client()

    # Calculate ECHOTOPinM
    hfl = destagger(ds["HHL"], "z")
    echo_top = interpolate_k2any(hfl, "high_fold", ds["DBZ"], [15.0], hfl)

    echo_top.attrs |= metadata.override(echo_top.message, shortName="ECHOTOPinM")

    with data_source.cosmo_grib_defs():
        client.to_fdb(echo_top)


These examples of use will see some consolidation work and will be released in the coming months. 


Future work


The plan for the next quarter are:

  • Further development and consolidation of the technologies and components already mentioned in this quarter. 
  • Start deploying first examples of the Flexpart showcase in EWC, using some of these components. 
  • Organize a GPU workshop with various key presentations from the community. The venue will be used to discuss and show the progress and efforts to port NWP and climate models and post-processing to GPUs. It will give an overview of the various approaches and technologies used.
  • Continue the discussion around how to integrate GPU computing in the post-processing toolchain. 
  • Collect more use cases from members of the pilot project.