Title: Progress report for Pilot project on "Adaptation to Emerging Technologies"
Period: Q3 2024 – ECMWF PAC Meeting (30 April 25)
Summary
This short report presents the progress and evaluation of the pilot project to date. It highlights key achievements, assesses the project's impact on collaboration, and offers reflections on the setup and lessons learned.
1. Project Objectives
The Adaptation to Emerging Technologies pilot project was launched to explore how ECMWF and its Member States can adapt to new technologies for accessing and processing data, developed by ECMWF within the Destination Earth project, and integrate these advancements into Numerical Weather Prediction (NWP) workflows. The project set out to demonstrate how these technologies could be leveraged to enhance operational data processing, ensuring seamless integration with ECMWF’s computing resources, including the European Weather Cloud (EWC) and its data infrastructure.
At the same time, the pilot aimed to assess the impact of these changes on Member States’ workflows and gather insights that could help refine ECMWF’s own systems to better support future developments and meet Member States’ evolving needs.
Key Objectives include:
- Developing and deploying open-source blueprints for efficient data access and processing, leveraging ECMWF technologies like Aviso for workflow automation, FDB for semantic access to data, Earthkit for managing GRIB data in Python, and Polytope for efficient data retrieval.
- Facilitating ECMWF product integration with the European Weather Cloud (EWC) enabling Member States to run workflows and pre-process IFS data before dissemination.
- Establishing a framework for collaboration on new data processing technologies, such as GPU acceleration and domain-specific languages.
- Exploring funding opportunities and long-term sustainability, following unsuccessful Horizon Europe funding attempt.
Two primary workstreams were defined:
- Flexpart Dispersion Model – implementing an end-to-end workflow for the use of atmospheric dispersion simulations on the European Weather Cloud.
- IFS Data Pre-processing – optimizing workflows for extracting, transforming, and aggregating forecast data for various applications.
The project set out to provide open, adaptable technological solutions while fostering knowledge exchange through workshops and regular collaboration meetings.
Project Duration & Funding
Personnel involved
|
3. Achievements and Outcomes
This pilot project demonstrates how emerging technologies can modernize data workflows for NWP while requiring adaptations from Member States to fully leverage ECMWF’s evolving data access tools. By implementing concrete use cases, we validated new approaches and provided valuable feedback that improved ECMWF’s own systems. The project focused on five key areas:
Focus Area | Key Achievement |
Flexpart Workflow on EWC | Deployed an end-to-end workflow integrating Aviso and Earthkit-Data, identifying infrastructure limitations such as the need for Infrastructure as Code (IaC) and Kubernetes support. |
Modernizing Preprocessing | Transitioned legacy Fortran-based preprocessing to Earthkit-Data, improving maintainability and integration with modern workflows. |
Collaborative Development | Worked closely with ECMWF to test and refine Polytope and FDB, ensuring its robustness and usability. |
Knowledge Sharing | Organized a webinar on GPU acceleration and domain-specific languages, fostering discussions on emerging technologies in weather forecasting. |
4. Key Developments and Results
Flexpart Workflow Deployment on European Weather Cloud (EWC)
This use case created a blueprint that integrates the European Weather Cloud (EWC) with key ECMWF technologies from the Destination Earth project. This blueprint serves as an example of how Member States can effectively leverage EWC resources while streamlining workflows through automation and real-time data processing.
The workflow follows an automated sequence: IFS forecasts are pushed to the European Weather Cloud, where Aviso, continuously polling the storage bucket, triggers a preprocessing job containerized with Earthkit-Data. This step is followed by running the Flexpart model in a container, and finally, a plotting application generates visual outputs. The architecture of this workflow is illustrated in the diagram below.
Workflow architecture:
Operational since September 2023, the EWC is still evolving. This use case evaluated EWC’s current capabilities, identifying missing components needed for automated workflows and providing feedback to the EWC team. The key priorities include ensuring that the workflow, which runs regularly for short periods, takes full advantage of cloud-based deployment by creating resources on demand following the pay-as-you-go principle. Additionally, fostering collaboration through a shared repository allows users to store, access, and run container images or applications seamlessly, enhancing efficiency across the platform.
To refine and validate our approach, we collaborated with key partners actively working with Flexpart on the EWC and Atos. The Royal Meteorological Institute (RMI) of Belgium, which operates a web application for Flexpart simulations on the EWC, faced performance issues when retrieving IFS forecast data via the MARS client. They showed strong interest in using Polytope as a faster alternative. However, this was initially limited by the absence of essential input fields on Polytope at ECMWF, which at the time only hosted a subset of parameters on model level (ml). Following several discussions, ECMWF agreed to make a limited set of additional variables available on Polytope by mid to end of May 2025. While not yet officially confirmed, ECMWF is also exploring a longer-term solution through a potential Data Bridge, hosted on SwissTwin at the Swiss National Supercomputing Centre (CSCS), which could eventually provide broader access to the for all Member States.
We also engaged with partners at the University of Vienna, the main developers of Flexpart, to exchange insights and share experiences working on the European Weather Cloud. They reported similar challenges—especially around workflow automation, data retrieval, and scalability—and expressed interest in running Flexpart on Kubernetes within the EWC. These discussions provided valuable context and reinforced the need for more flexible, scalable infrastructure across the platform.
The pilot enabled us to identify key limitations but also helped trigger improvements through ongoing dialogue with ECMWF and the EWC team:
Key Outcomes:
- Data Access via Polytope: Several required IFS fields were initially unavailable on Polytope. After repeated exchanges, ECMWF agreed to make a limited set of missing fields available on ECMWF Polytope by the end of May 2025. A broader solution is under consideration through the development of a Data Bridge on SwissTwin, though this remains tentative. Estimated timeline: End of Q2 2025.
- Infrastructure as Code (IaC): Initially, deployment within the EWC required manual setup, which was time-consuming and prone to inconsistencies. Throughout the pilot, we engaged in extensive discussions with the EWC support team to highlight the need for an automated and reproducible deployment process. Following these discussions, Terraform-based IaC support has been made available in the EWC since mid-March 2025. While it is not yet integrated into our pipeline, its availability reflects how the pilot helped identify and prioritize this requirement.
- Container & IaC Repository: Currently, there is no shared repository within the EWC for container images and IaC configurations. We are in discussions with the EWC team about this limitation, as a dedicated registry would significantly enhance collaboration and consistency. In the meantime, DockerHub remains a viable alternative for managing and sharing these resources.
- Cloud Resource Management: The workflow currently runs on a continuously active virtual machine instead of dynamically provisioning resources (e.g., via Kubernetes). This is inefficient. EWC is expected to support short-lived job execution by June 2025, and we are in contact with the EWC team to follow up on progress and to be ready to test the functionality as soon as it becomes available.
Remaining Actions Within the Project Timeline Through End of 2025
▢ Data Retrieval via Polytope: Once the required input data becomes available on Polytope, integrate it into the workflow for seamless data retrieval.
▢ Transition to IaC-Based Deployment: Replace the static virtual machine with a fully automated, on-demand infrastructure setup.
▢ Establish a Shared Repository: Advocate for a centralized container and IaC registry within EWC to streamline deployment and collaboration.
Modernizing Flexpart Preprocessing with Earthkit-Data
Resulting from our discussions with colleagues at the University of Vienna, the main developers of Flexpart, we explored modernizing parts of the existing flex_extract software, including its Fortran-based preprocessing component, with a Python-based solution using Earthkit-Data for seamless handling of GRIB data.
The preprocessing step is essential for preparing ECMWF meteorological fields as input for Flexpart. It involves performing various operations on the input data, such as deaggregating precipitation values and other transformations required for accurate atmospheric transport modeling. With Earthkit-Data, these fields can be efficiently read in GRIB format and converted into Xarray datasets, enabling powerful multi-dimensional operations that enhance readability, flexibility, and efficiency.
The current repository is private; however, access can be granted upon request for those interested.
Key Outcomes
- Improved Readability & Maintainability: Transitioning from Fortran to Python enhances code clarity and flexibility.
- Simplified GRIB Handling: Earthkit-Data simplifies ECMWF data processing
Remaining Actions within the Project Timeline Through End of 2025
▢ Collaborate with University of Vienna for refinements
▢ Publish the processing with earthkit-data as Flexpart tool
Deploying Polytope and FDB at MeteoSwiss
MeteoSwiss has actively contributed to the deployment of Polytope and FDB, aligning its infrastructure with ECMWF’s evolving data methodologies to enhance model output accessibility for public users and downstream applications.
To support this effort, bi-weekly meetings have been held for over a year, facilitating regular exchanges on the deployment of Polytope and FDB at MeteoSwiss. The primary objective is to improve ICON model output accessibility for downstream applications, ensuring seamless data retrieval and integration.
As part of this collaboration, we actively contribute to ECMWF developments by reporting bugs, providing feedback, and maintaining continuous engagement through meetings and shared goals. Our contributions focus on the following key areas:
- Deployment Consolidation: Streamlining the release deployment process for Polytope and FDB from ECMWF.
- Extended Grid Support: Implementing ICON grid and rotated latitude/longitude support in Polytope.
- Server Robustness & Performance Evaluation: Developing a framework to assess performance, including stress tests for multiple users and parallel large data requests, to define Polytope’s scalability limits—an essential goal that we have successfully achieved.
Workflow Architecture:
Remaining Actions within the Project Timeline Through End of 2025
▢ Monitoring & Reliability – Implement monitoring for stable operation, centralized logging, automated error reporting, and performance tracking with request tracing, health metrics, and alerts.
▢ Deployment Improvements – Streamline image building and deployment, enable rollback capabilities, maintain a stable production version while testing new releases (DEPL), automate container build chains, and introduce alerts for failed deployments.
Advancing Environmental Data Retrieval (EDR) Integration
Remaining Actions Within the Project Timeline Through End of 2025 Potential Actions in Case of Project Extension
A key objective of the extension is the development of Environmental Data Retrieval (EDR) based on Open Geospatial Consortium (OGC) Standards, in collaboration with UK Met Office, a leading contributor to its development. EDR aims to standardize meteorological data retrieval within the OGC framework, improving interoperability and accessibility across geospatial and forecasting applications.
While EDR is an established standard, further adaptation is needed for its effective use in meteorology. The extension year would enable collaboration with UK Met Office to:
- Enhance support for meteorological data formats, such as GRIB, NetCDF, and BUFR, to improve alignment with existing standards.
- Address current limitations, such as the lack of support for ensemble (ENS) data, which is crucial for probabilistic forecasting. EDR does not yet define a standardized method for requesting or representing ensemble datasets, limiting its applicability for meteorological use cases.
- Leverage MeteoSwiss expertise in data retrieval APIs, drawing from its experience in opening its data this year, to contribute to refining the standard and improving accessibility.
Facilitating Knowledge Sharing through Webinars
Remaining Actions Within the Project Timeline Through End of 2025
▢ Webinar: Cloud-Based Technologies in Member States. The next webinar will explore how Member States leverage cloud-based technologies for weather and climate applications. It will highlight real-world use cases, challenges, and best practices, fostering collaboration and knowledge exchange on cloud adoption in meteorology.