Tasks

Task Nr.	Task	Description	Output	Outcome	Contributors
WP1
1.1	Evaluation Metrics	Define evaluation metrics and headline variables that will be used to assess the performance of data-driven forecasting models. This task is essential for establishing clear criteria for model evaluation. The task involves a thorough review of existing evaluation metrics used in meteorological forecasting and determining which metrics are most relevant for assessing the performance of data-driven models, especially at the km-scale and for evaluating performance for extremes.	Document outlining the chosen evaluation metrics and headline variables. (Tentative: Common software for computing a base-set of evaluation metrics either against test dataset or observations.)	Clear criteria established for assessing the performance of data-driven forecasting models, enabling objective evaluation and comparison of different approaches.	Met Éireann, RMI, Météo-France
1.2	Datasets for high-resolution regional analysis data.	Ensure anemoi-datasets can deal with the diversity of data we have in the community and evaluate and extend its design and functionality. In this task we generate anemoi-datasets for various regional and global datasets in order to test and evaluate anemoi-dataset as a common tool that generates datasets that are compatible for all models and architectures employed in the project.	Various plugins of Anemoi for various sources of regional datasets (e.g. CERRA). Feedback concerning usability and functionality of anemoi-datasets. New release of anemoi-datasets capable of integrating the required datasets.	anemoi-datasets can generate ML-ready datasets for regional datasets of interest in the pilot project, allowing to more easily compare training and inference using different datasets.	KNMI, ECMWF, DMI, MeteoSwiss, ECMWF, AEMET, RMI, MET Norway, MétéoFrance, FMI
1.3	Extension of Anemoi to AICON	Integration of the relevant functionality of the AICON into the Anemoi infrastructure such that we can easily evaluate and compare this approach with the other existing model architectures. In particular the task integrates the ICON data grid mesh into the hidden mesh encoder, as well as the nested hidden graph approach of AICON.	Version of Anemoi that can be used to train directly AICON on the ICON native grid and with the nested hidden graph of AICON.	Infrastructure of Anemoi can be used to train AICON on ICON global and LAM datasets.	DWD, ECMWF,
1.4	Data-driven LAM Model	Build a data-driven limited area model(s) (LAM) using existing exemplar models as references. The aim is to create a model tailored for regional forecasting at kilometer-scale resolution. A special focus will be given to variables important for high-resolution LAM applications (e.g., precip, low cloudiness, wind gusts, …). Development of LAM model(s) involves selecting appropriate architectures (such as graph neural networks, vision transformers or spectral neural operators) based on models like AIFS and Neural-LAM. The LAM models should integrate boundary conditions by combining a cutout global dataset with the embedded high resolution dataset over the region of interest. Investigation of the effect of boundary zone width and training on different datasets for LAM predictions. With respect to infrastructure, we integrate important functionality of existing approaches like Neural-LAM into a common framework (Anemoi) such that we can easily compare various approaches. In particular, the task integrates the concept of hierarchical mesh into the graph of Anemoi. Also the LAM mode of Anemoi is consolidated and tested with more than one regional dataset. Sensitivity experiments will be performed to determine the ideal length of training datasets balancing model accuracy and computational efficiency.	A system for training, validating and testing data-driven LAM models with various model architectures and proper boundary conditions tested over at least one region.	Successful development of a tailored LAM model capable of producing accurate forecasts at kilometer-scale resolution, demonstrating the potential of data-driven approaches for regional forecasting.	DMI, MeteoSwiss, DWD, RMI, MétéoFrance, ECMWF, Met Éireann
1.5	Stretched-grid Data-driven Model	Continue the design and implementation of a global stretched-grid system using suitable architectures, such as those used in the AIFS model based on Anemoi developed jointly by ECMWF and MetNorway. The objective is to develop a model that seamlessly transitions from coarse resolution to high-resolution grids over specific domains of interest. The task requires exploring different approaches for constructing stretched grids and adapting existing models like AIFS to support this configuration. Approaches like high resolution hidden mesh over the regional domain should be merged into the main framework for use by all institutions on various datasets. It involves creating training datasets for stretched grid models, model training, and testing over at least one region to assess its performance compared to other approaches. Sensitivity experiments will be performed to determine the ideal length of training datasets balancing model accuracy and computational efficiency.	A system for training, validating and testing stretched-grid data-driven models tested over at least two regions.	Creation of a global stretched-grid system using suitable architectures, with improved resolution over specific domains of interest, showcasing the effectiveness of data-driven models for transitioning between different grid resolutions.	MetNorway, ECMWF, KNMI, MeteoSwiss, AEMET, FMI
1.6	Downscaling / Superresolution	Downscaling using generative AI is an alternative approach to auto-regressive model emulation, which has the potential to ease the generation of convection resolving ensembles (e.g. SEEDS from Google or CorrDiff from NVIDIA). Here, the focus is on investigating and implementing downscaling techniques to translate coarse-scale data (such as global data-driven forecasts) to higher spatial resolution for better accuracy at the regional-to-local level, specifically for resolving convection. This task involves researching and implementing machine learning-based downscaling methods to improve the resolution of global forecasts. It includes evaluating the effectiveness of these methods through testing over at least one region and exploring the representation of extremes. The task will benefit from (but not duplicate) the work in Destination Earth project DE_371 as well as other on-going projects.	Downscaling method tested over at least one region, demonstrating improved resolution.	Successful implementation of downscaling techniques to enhance the resolution of global forecasts for better accuracy at the regional level, contributing to the refinement of data-driven forecasting systems. Potential applications go beyond forecasting and can also be applied to reanalysis datasets.	MétéoFrance, MeteoSwiss,
1.7	Intercomparison of Model Approaches	Compare the performance of the different model approaches in Tasks 1.4, 1.5, and 1.6 for at least one geographical region. The aim is to evaluate the relative strengths and weaknesses of each approach. The task involves conducting a comprehensive analysis comparing the performance of different model approaches based on predefined evaluation metrics (established in Task 1.1) and ideally based on a single dataset.	Comparative analysis highlighting the strengths and weaknesses of each approach for regional modeling.	Informed decision-making regarding model selection and further development based on comprehensive evaluation of different model approaches, contributing to advancements in data-driven forecasting techniques	RMI, MetNorway
1.8	Multiple encoders architecture	It will be common that the global and regional datasets are defined with different variables, resolution, vertical coordinates system or different levels. A single encoder configuration would require a coherent definition of variables and vertical levels. Instead of having to data reshape / interpolate all regional datasets to a form compatible with the global datasets, in this task we implement functionality to support multiple encoders such that the global and regional datasets are processed into the hidden mesh through different encoders.	Version of Anemoi that can be configured with various encoders for different datasets.	The multiple encoder system is tested and evaluated with at least one configuration of datasets	KNMI, MET Norway, MeteoSwiss, DMI, DWD
1.9	Training on CERRA dataset	The CERRA dataset is a pan-European reanalysis with very high horizontal resolution (5.5 km) forced by the global ERA5 reanalysis. A grid-stretching training of ERA5 with CERRA in the European will provide a pre-trained model of high value for all participants of the pilot project for fine tuned training on high resolution regional datasets. It has been shown that a fine tuning based on a pre-trained ERA5 model improves scores for high-resolution regional datasets that do not contain as much historical data as ERA5.	A pre-trained model using ERA5 + CERRA available to all members of the project.	The data will be used for further fine tuning and learn about the benefits of using a pre-trained model for high-resolution regional fine-tuning.	KNMI, RMI, MET Norway
1.10	Transfer Learning	Use the output of Task 1.9 to adapt the pre-trained model and fine-tune to various high-resolution datasets. In this task we will develop methods and approaches to fine-tune on high resolution regional datasets. In particular, we will design approaches to generate fine tune models for hourly trained data from the 3 hourly temporal resolution of the CERRA dataset.	Documentation of approaches for transfer learning from the pre-trained model to high resolution regional datasets using grid-stretching or LAM models.	The capability to use pre-trained models can reduce training effort or increase the performance of data-driven models. This task can also be seen as a step towards fine-tuning future foundation models for regional data-driven forecasting.	MetNorway, ECMWF, AEMET, KNMI
1.11	Scaling Efforts	Improve the scalability and performance of data-driven models across multiple GPUs. Where global, LAM and stretched-grid approaches share a codebase, collective efforts across these approaches will improve the training performance and scale to larger ML models (both in terms of complexity and resolution) across multiple GPUs. It includes optimizing model architectures, algorithms, and distributed computing techniques to improve scalability and performance.	Infrastructure to train and test models with improved scalability and performance.	Enhanced scalability and performance of data-driven forecasting models across multiple GPUs will lead to a capability to tackle higher-resolution datasets and reduce training time, allowing for broader ablation studies.	ECMWF,
WP 2
2.1	Establish Evaluation Metrics and Headline Variables (jointly with task 1.1)	The objective is to define the evaluation metrics and headline variables that will be used to assess the performance of probabilistic approaches for data-driven forecasting. Thorough evaluation of these data-driven ensemble systems will be critical, as initial work has highlighted that data-driven models can score well on metrics such as the Continuous Ranked Probability Score, while remaining significantly overconfident. Evaluation should emphasise the value of data-driven ensembles for the detection of extreme events.	Document outlining the chosen evaluation metrics and headline variables for data-driven ensemble verification.	Clear criteria established for assessing the performance of probabilistic data-driven forecasting models, enabling objective evaluation and comparison of different approaches.	DMI (4 pms1)
2.2	Approaches for Building Reliable Ensembles with Data-driven Models	This task will interact with WP1 to conduct a review to identify optimal approaches for building reliable ensembles with data-driven models. This entails evaluating existing methodologies, focusing on their ability to address initial condition uncertainty and model uncertainty. The most promising approaches are selected for further exploration and adaptation in the subsequent tasks.	A document detailing the evaluation of existing methodologies for building reliable ensembles with data-driven models.	A first set of approaches for building reliable ensembles, providing a clear direction for subsequent tasks focused on adapting and implementing these methodologies.	DMI (1 pm2), KNMI (1 pm), DWD (1 pm), Italy (1 pm), MF (1 pm)
2.3	Representing Initial Condition Uncertainty	The aim is to establish a baseline ensemble system for global and local approaches using initial condition uncertainty only, with deterministic data-driven model (WP1). To this end, initial condition uncertainty will be represented for forecast reliability across all timescales. As an initial approach, the task will reuse initial condition uncertainty from NWP models	A functional baseline ensemble forecasting system for global and local approaches.	A reliable baseline system for data-driven forecasts across all timescales.	DMI (3 pms3), DWD (4 pms)
2.4	Incorporating Model Uncertainty	This task will investigate approaches for incorporating model uncertainty in the baseline ensemble system (Task 2.3). This includes training to minimise probabilistic skill scores, using Diffusion models or Generative Adversarial Network (GAN) approaches. Alternately, ensembles could be bypassed and probability distributions could be directly predicted.	A document outlining the investigated approaches for incorporating model uncertainty in data-driven models.	Reliable ensemble forecasts through the effective incorporation of model uncertainty.	KNMI (10 pms4), Italy (1 pm), DMI (3pms), MF (3 pms)
2.5	ML ensemble diagnostics	Develop a framework for diagnostics that will allow ivestigation of error sources in ML-based models as these will probably be different from error sources in physics-based NWP models, comparing forecast errors in data-driven and NWP models, studying error growth properties of perturbations.	Relevant diagnostics included in the Anemoi framework for improved understanding of errors and uncertainties in ML-based ensembles.	A better understanding of uncertainty characteristics of ML-based ensembles that will help us to develop more reliable data-driven ensemble forecasts.	DWD (6 pms), Italy (1 pm)
2.6	Enrichment of the Ensemble Using Generative Methods	This task will investigate a complement to forecast ensemble members, where the ensemble is enriched from pre-existing members by exploring generative methods (e.g. GANs, Diffusion models). This means extension of these methods to many variables (precipitation, etc) and extreme events are given particular attention.	Integration of generative methods to enrich forecast ensemble members, with a focus on extreme events.	Improved ensemble forecasts, through the successful enrichment of the ensemble using generative methods.	MF (6 pms5), SMHI (9 pms)
2.7	Documentation and Reporting	This task entails documenting all research findings, methodologies and outcomes in WP2 generated throughout the course of the project. It aims to ensure transparency, reproducibility, and dissemination of project results. It also involves organizing and presenting findings at conferences, workshops, and other relevant forums to engage with the scientific community.	Comprehensive report summarizing the research conducted, including results, conclusions, and recommendations for further development.	Transparent documentation and dissemination of research findings, methodologies, and outcomes, facilitating knowledge sharing and contributing to the advancement of the scientific community.	(MF 1 pm6), KNMI (1 pm), DWD (1 pm), DMI (1 pm), SMHI (1 pm), Italy (1pm)
WP3
3.1	Hybrid solutions: 4-dimensional variational assimilation (4D-Var) with ML components	4D-Var is a state-of-the-art assimilation method that is used by many weather centers for research and operational purposes. However, the development and maintenance of 4D-Var components (e.g., linearized model operators) are difficult and require a significant amount of human and machine resources. This task is designed to build expertise and start constructing a prototype 4D-Var (using the available data-driven model in Anemoi software) in which the tangent-linear and adjoint operators are obtained by automatic differentiation. We also aim to build the necessary functionality in Anemoi software that is generic for all available models. In this task, other hybrid ML-DA approaches are possibly included (e.g., ML-based bias correction).	Available technical implementation (incorporated into Anemoi)	For automatic differentiation, a technical framework that is working with basic conventional observation types (e.g., aircraft observations) and a case study that demonstrates the performance of the obtained system or operators.	MET Norway (8 pms), DWD (4 pms), MeteoFrance (TBD) KNMI (TBD)
3.2	Fully AI-based solution: Direct integration of data assimilation into neural networks.	In this task, we seek to integrate the recently published AI-Var approach (https://arxiv.org/abs/2406.00390) into the AnemoI framework in order to streamline the development of innovative AI-based weather prediction technologies with a focus on future operational employment. As a first step, AnemoI will be extended by a “Data assimilation” as well as a “Observations” module in order to facilitate data assimilation. Then the AI-Var algorithm will be implemented within, building on the existing structures from the AnemoI “Models”, “Training” and “Datasets” modules. Specific data assimilation experiments will then be setup, tested and later extended to demonstrate the potential of a completely AI-based data assimilation cycle. By incorporating AI-Var, which leverages neural networks to directly perform data assimilation, AnemoI can be extended in its capabilities to process observational data, thus enabling the integration of data assimilation with AI-based forecast models (WP1) and ensemble approaches (WP2).	An AI-based data assimilation module in the AnemoI framework with the capability of processing observational data and producing initial conditions for AI-based NWP models.	The incorporation of the AI-based data assimilation approach within the AnemoI framework is expected to yield substantial advancements towards a fully data-driven NWP chain, with the potential to set a new standard for operational weather forecasting and to demonstrate the transformative potential of AI in enhancing weather prediction capabilities.	DWD (22 pms), ITAF Met Service (4 pms, starting in 2nd half of 2025), MeteoFrance TBD
3.3	End-to-end solution: Observation-driven models	In this task, the ultimate goal is to directly drive the ML-based NWP models by either conventional or non-conventional observations. The corresponding developments have been started at ECMWF and we aim to closely collaborate through the Anemoi framework. The main objectives are to enable the digestion of irregular datasets / grids (observation networks) in Anemoi models and to explore generative models for direct observation prediction. Since this area of research is rather innovative and still experimental, there are major uncertainties how these observation-driven approaches will function or how successful they will be in the longer term.	The extension of Anemoi software with additional features and functionalities (datasets, graph, models)	The extension of Anemoi software with additional features and functionalities (datasets, graph, models)	MET Norway (9.5 pms), DWD (4 pms)
WP4
4.1	Assess Differences Between NWP and Data-Driven Models	Conduct a comprehensive analysis to identify and document the key operational and infrastructural differences between traditional Numerical Weather Prediction (NWP) models and data-driven models. This analysis should include: Operational Requirements: Examining the reliance on mathematical formulations and supercomputing platforms for NWP models versus the dependency on machine learning algorithms and specialized hardware, such as GPUs, for data-driven models. Computational Demands: Comparing the computational demands and efficiency of both model types. Data Handling: Assessing the differences in data handling, storage, and processing needs. Model Integration: Identifying challenges and requirements for integrating data-driven models into existing NWP systems.	A detailed report that outlines the operational and infrastructural differences between NWP and data-driven models, including specific challenges and recommendations for integration.	A thorough understanding of the distinct requirements and potential challenges for integrating data-driven models into existing forecasting systems, providing a foundational basis for future integration efforts.	DMI (0.3 pm), MeteoSwiss (0.4 pm), ECMWF (0.3 pm), (KNMI and others TBD)
4.2	Develop a Plan for Hardware Compatibility and Scalability	Formulate a plan to address compatibility and scalability issues that arise from integrating machine learning infrastructure into existing forecasting systems. This task involves: Hardware Assessment: Evaluating current hardware capabilities, including processing power, storage, and network capacity. Gap Analysis: Identifying gaps and limitations in the current hardware setup that may hinder the efficient training and inference of ML models. Solution Proposals: Proposing targeted solutions to bridge identified gaps, ensuring the infrastructure can support the increased computational demands of ML models. Scalability Planning: Developing strategies to scale the infrastructure as data volumes and model complexities grow over time.	A strategic document outlining the compatibility and scalability plan, including detailed assessments, identified gaps, and proposed solutions	Enhanced infrastructure readiness that ensures efficient training and inference of ML models in weather forecasting, supporting robust and scalable forecasting systems.	DMI (0.3 pm), MeteoSwiss (0.4 pm), ECMWF (0.3 pm), (KNMI and others TBD)
4.3	Identify and Address Software and Services Gaps in NWP Process Chains	Conduct a thorough evaluation to identify software and service gaps within the current Numerical Weather Prediction (NWP) process chains that are relevant to machine learning applications. This task involves: Gap Analysis: Assessing the existing software and services used in NWP processes to determine where they fall short in supporting machine learning applications. Licensing and Compliance: Considering various software licensing requirements and ensuring compliance across different deployment infrastructures among Member States. Solution Development: Proposing practical and feasible solutions to bridge the identified gaps, facilitating smoother integration of machine learning models into NWP processes.	A list of software and service gaps, accompanied by proposed solutions for each identified gap.	Improved alignment and integration of NWP and ML processes, leading to enhanced forecasting capabilities and more accurate weather predictions.	DMI (0.4 pm), MeteoSwiss (0.4 pm), ECMWF (2 pm), (KNMI and others TBD)
4.4	Establish Best Practices for Shared Data, Infrastructure, and Services	Develop and disseminate comprehensive best practices for effectively utilizing shared data, infrastructure resources, and services. This includes: Resource Allocation: Creating guidelines on the optimal allocation of resources such as ECMWF’s Atos supercomputer, the European Weather Cloud, the MLflow server, and the Anemoi Catalogue. Efficient Usage: Establishing protocols for the efficient use of these shared resources to avoid conflicts and ensure maximum performance. Collaboration Protocols: Developing collaboration protocols to enhance teamwork and information sharing among different teams and stakeholders. Documentation and Training: Providing detailed documentation and training materials to ensure all users can effectively follow the established best practices.	Guidelines and templates for the usage of shared infrastructure and service usage.	Optimized utilization of shared resources, fostering improved collaboration, efficiency, and overall effectiveness within the forecasting community.	DMI (1 pm), MeteoSwiss (2 pm), ECMWF (2 pm), (KNMI and others TBD)
4.5	Develop Joint Test Infrastructure	Create a robust joint test pipeline that seamlessly integrates with the software repositories (e.g., on GitHub) utilized by the community. This infrastructure will: Collaborative Development: Support collaborative development efforts by enabling multiple teams to contribute to the same codebase. This includes linting of code. Comprehensive Testing: Facilitate thorough testing of updates and improvements to ensure they meet quality standards before deployment. Continuous Integration: Implement continuous integration practices to ensure that code changes are automatically tested and validated.	Joint test infrastructure.	Streamlined development processes and improved coordination among developers, leading to continuous improvement and innovation in forecasting systems.	DMI (1 pm), MeteoSwiss (2 pm), ECMWF (2 pm), DWD (KNMI and others TBD)
4.6	Implement MLOps Practices	Establish and implement robust MLOps practices, with a strong focus on services and automation. This task involves: Experiment Tracking and Model Registry: Setting up services for tracking experiments and maintaining a registry of models to ensure reproducibility and version control. This could potentially include existing infrastructure from ECMWF. Automated CI/CD Processes: Developing and deploying automated Continuous Integration/Continuous Deployment (CI/CD) processes for seamless code integration, and comprehensive testing. Pipeline Automation: Creating automated pipelines for training and inference, leveraging High-Performance Computing (HPC) centers to optimize performance and scalability.	A central MLflow server for experiment tracking and model registry. A demonstration of one automated training or inference pipeline, including CI/CD, deployed on an HPC center.	Accelerated and reliable development and deployment of ML-based forecasting systems, enhancing the overall efficiency and effectiveness of the forecasting process.	DMI (1 pm), MeteoSwiss (2 pm), ECMWF (2 pm), (KNMI and others TBD)

Page tree

Tasks