Title: Quarterly Report for Pilot project on “Adaptation to Emerging Technologies”

Period: Q2 2024 (April 1 to June 30)


Introduction

This report covers the progress made during the second quarter of 2024. In this quarter, our goals focused on further developing and consolidating the key technologies and components such as FDB, Polytope, and the Python-based data processing framework. We planned to organize in this quarter a GPU workshop featuring key presentations from the community to discuss and showcase the progress in porting NWP and climate models, as well as post-processing to GPUs, providing an overview of various approaches and technologies used. Additionnaly we aimed to deploy initial examples of the Flexpart showcase within the European Weather Cloud (EWC). Finally, we intended to collect more use cases from members of the pilot project to further refine and demonstrate our solutions.

Personnel involved

Carlos Osuna, Victoria Cherkas, Christian Kanesan, Stefan Friedli and Katrin Ehlert were all provided as in-kind contributions from MeteoSwiss. Additionally, Nina Burgdorfer joined the team from MeteoSwiss in March 2024, with project resources.

Highlights

During this quarter, significant progress was made in the development and consolidation of the emerging data technologies described in the Q1 report such as FDB, Polytope and the data processing framework in Python. The primary objective was to advance the technologies and move closer to achieving a production-ready solution. A significant milestone in this area was our collaboration with ECMWF, resulting in the deployment of a beta version of an FDB/polytope-based solution. This solution enables users to access real-time ICON production data and retrieve the data in Python as Xarray datasets.

Emerging Technologies Demo, MeteoSwiss, 25.04.2024

The functionalities of this solution were showcased in a live online demo, which was attended by a large number of interested users. The Jupyter notebooks of the demonstration have been made available for users to test the different functionalities provided by the beta version of the solution, with continuous maintenance of the services to ensure functionality and availability. The solution encompasses several key functionalities:

  1. Multidimensional Hypercubes: Request model data for entire hypercubes (parameters, ensembles, vertical levels, and specific time steps).

  2. Single Location or Timeseries Requests: Retrieve data for a single location or timeseries in coverageJSON.

  3. Python Framework for NWP Data Processing: Query the data and process NWP data in Xarray with the help of a python framework named meteodata-lab, with operators to assist with interpolations across different vertical coordinate systems (pressure, model level, etc.), horizontal re-gridding, and computation of essential meteorological variables.

  4. Access to real-time model data as well as historical datasets.

  5. Pipeline Deployment: Deploy pipelines that process NWP data and store results back into the FDB.

Main outcomes of the demonstration:

The main outcomes of this solution are easy access to data and performance improvements.

Easy Access to Data

Example:

  1. Data request: users can easily define a data request semantically using simple and intuitive Python code:



    This defines a request to retrieve total precipitation data for 11 ensemble members of the ICON-CH1-EPS model, for the run on June 26, 2024, at 00:00, in 60-minute intervals. The mars module from the meteodata-lab Python framework helps build valid MARS requests used as a base for the FDB index.
  2. Data retrievement: the meteodatalab.mch_model_data module provides convenient functions to retrieve model data, leveraging Earthkit-data in the background to read the retrieved data from FDB seamlessly.


  3. Quick snapshot of your dataset: The data is returned in Python as an Xarray dataset, facilitating further analysis and visualization.

     

  4. Data processing: can be optimized using meteodata-lab's advanced preprocessing and aggregation tools as illustrated below with the use of a delta operator reaggreating the accumulated value from the reference time to 6 hours intervals.

     

  5. Effortless Data Visualization with Matplotlib: The processed data can be easily and quickly plotted using the widely used Matplotlib utilities, as presented below.

Performance Improvements

Polytope's feature extraction functionality significantly improves data access efficiency. Below is an example demonstrating the reduced data transfer and storage on the user side:

Example: A user wants to calculate the 50th percentile for all ensemble forecasts in March at an airport's 10m wind speed field. The raw model files amount to 90 TB. Using FDB to request the necessary data reduces this to 60 GB, as FDB indexes the data to retrieve only the wind speed field. After computing the median forecast, the data size further shrinks to 6 GB. This 6 GB file is stored in FDB for potential use by others.

Finally, a user interested in just one specific point can employ feature extraction in Polytope, retrieving only the data for the airport, resulting in a data transfer of approximately 6 kilobytes.

The development and implementation of these functionalities were made possible through the contributions of Emanuele Danovaro, Christopher Bradley, Mathilde Leuridan, James Hawkes, and Tiago Quintino from ECMWF.

A recording of the demo event is available for download here (Password: ICONworkflow2.0), and the Jupyter notebooks to test the solution can be found here.

Enhancing GRIB2 Data Handling in Xarray

 

Current capabilities for handling GRIB2 data in Xarray with ECMWF CFgrib:

CFgrib facilitates the integration of GRIB data within the Python ecosystem, specifically through xarray. This enables the use of powerful tools in the Python numerical stack such as Numpy, Matplotlib, Jupyter, Dask, Scipy, Pandas. These tools collectively allow for efficient data handling, analysis, and visualization of GRIB data.

Identified Gap

Despite the robust support for reading GRIB data, a significant limitation exists:

  • There is no native support for writing Xarray objects to GRIB2 files.

Our Contribution with meteodata-lab

To address this limitation, MeteoSwiss developed in meteodata-lab the following enhancements:

Retention of GRIB Message in Xarray object’s attributes:

We preserve the original GRIB message template as attributes within the xarray.DataArray (e.g. precipitation field shown below). This ensures that the structural integrity of the GRIB data is maintained throughout the workflow.

Example:

Xarray object of the precipitation field with the GRIB message included in its attributes:

Ensuring Metadata Integrity:

We have implemented a collection of operators that update and maintain metadata accuracy. This guarantees that any modifications to the data are reflected appropriately in the metadata.

Example:

The destagger operator updates the field's GRIB message after the field has been destaggered:

These contributions enable users to write xarray objects to GRIB2 files, a functionality that was previously unavailable. This enhancement is crucial for workflows that require the creation and manipulation of GRIB2 data.

Future Integration

We are currently in discussions to integrate the capabilities of meteodata-lab into ECMWF earthkit-data.

ECMWF Webinar “Using GPUs to Accelerate National Weather Forecasts - Challenges and State-of-the-Art Practices”, 27.06.2024

As one of our goals for this quarter, we organized an online webinar featuring various key presentations from the community to discuss and showcase the progress and efforts in porting NWP and climate models to GPUs.This event was attended by representatives from 19 different weather services. The webinar featured several key presentations:

  1. ICON's Operational GPU Integration by Xavier Lapillonne, MeteoSwiss: We explored rewriting of NWP models for GPU efficiency, illustrated by the recently operational ICON project at MeteoSwiss. We learned about the integration of ICON into the new ALPS High-Performance Computing Platform at CSCS, addressing operationalization and maintenance challenges. 

  2. Performance and Adaptability with Domain Specific Language (DSLs) by Christoph Müller, MeteoSwiss: Highlights of the shift to Python-based user codes using DSLs like GT4Py, emphasizing performance portability and adaptability.

  3. Starting anew with the Development of Momentum® Weather and Climate Model by Iva Kavcic, Met Office: We explored new horizons with the Next Generation Modelling System Programme from the Met Office, paving the way for a new dynamical core and software infrastructure in weather and climate modelling, with plans for implementation from the mid-2020s onwards. 

  4. NWP Models on AMD GPUs by Bentorey Hernandez Cruz, ECMWF: Details of the ongoing development of NWP models on AMD GPUs within the DestinE project on the LUMI supercomputer, addressing the challenges and progress in utilizing AMD's architecture for weather and climate modeling.

A recording of the webinar is available for download here (Password: GPU_webinar_pilot24).

Future Work

  • Achieve deployment of initial examples of the Flexpart showcase in EWC

  • Discuss possibilities to collaborate with the GLORI project

  • Organise an in-person workshop between MeteoSwiss and ECMWF at the ECMWF center in Bonn to collaborate on solutions, debug issues, and exchange knowledge

  • Integrate a Fortran interface into FDB to enable direct writing from ICON to FDB

  • Consolidate eccodes (MARS/GRIB) definitions for ICON model output data

  • Continue the discussion around how to integrate GPU computing in the post-processing toolchain

  • No labels