This page and contents are for internal use only, please do not share it or the contents.

Contributors :

Model development: Mihai Alexe , Simon Lang & Matthew Chantry

Data pipelining and ML tools: Baudouin Raoult , Florian Pinault & Jesper Dramsch

Analysis: Zied Ben Bouallègue & Linus Magnusson

Thanks also to Florence, Florian, Andy, Irina, Matthieu, Peter D, Chris K, Victoria and more for their support and encouragement.

What and why?

ECMWF will be exploring the use of machine learning to make forecasts, particularly targeting skilful and reliable ensembles. The working name for this machine learned forecast model is AIFS. To prepare for this work, the above staff have contributed some of their time to do some exploratory work. The objectives were to better understand the challenges in constructing a machine learnt deterministic forecasting system with comparable skill to Pangu & GraphCast. For more on the verification of data-driven (ML) forecast models please explore these pages.

We have chosen to explore the Graph Neural Network approach used by Deepmind in GraphCast, due to it's natural representation of the sphere and ease of using custom grids.

How does a Graph Neural Network learn to forecast?

Intro to GNNs https://distill.pub/2021/gnn-intro/
GNNs are comprised of small neural networks attached to edges and nodes on a graph. These NN process information on the edges and nodes and combine them to give a prediction for the nodes at the next timestep. These NN are applied to every node/edge, meaning that the model is naturally translationally invariant. The model therefore learns a general representation of the physical processes happening globally, rather than trying to learn a bespoke model for each area of the globe. Information such as orographic information breaks that invariance, but only via dependencies on these local variables.

AIFS Configuration

So far we have deviated in the following ways from GraphCast:

Use of a lower resolution and Octahedral grid, specifically O96 (~1deg), to save computational cost- the GNN approach is flexible regarding the spatial representation of the data and hence it is possible to use native IFS grids as input data. The "processor" grid we use is similar to that of GraphCast, but twice as coarse, (~2deg).
The results below are shown on a model with only one timestep provided as input. Work is finished to incorporate 2 timesteps (like GraphCast) (Dec 2023).
We use a comparable number and choice of fields to Pangu-Weather, again to save computational costs during model development. Specifically we use q, t, u, v, w, z on 13 pressure levels, and 2m temperature, 10m u and v component of winds, surface pressure, mean sea level pressure, 2m dewpoint temperature, sea-surface temperature (not recommended for analysis).
We use hourly ERA5 data, but still ask the model to predict 6hours ahead (the model is trained to make 3-day forecasts as with GraphCast).

Machine Learning Pilot Project – ECMWF (Open Space) > AIFS - exploratory work > processor.png

"Processor Graph / grid", where the majority of calculations take place. Light colour nodes have longer-range connections to aid propagation of information over larger distances.

Machine Learning Pilot Project – ECMWF (Open Space) > AIFS - exploratory work > encoder.png

"Encoder Graph", which maps from ERA5 (small black dots) to the "Processor grid" (blue dots).

Machine Learning Pilot Project – ECMWF (Open Space) > AIFS - exploratory work > decoder.png

"Decoder Graph", which maps from the "Processor grid" (red dots) back to the ERA5 grid (small black dots).

Preliminary Results

Instantaneous predictions for 6h

Example of a 6h prediction for the model. Columns show:

Input, target, GNN prediction, GNN prediction - target, GNN prediction - input, target - input

A perfect model would should columns 2 & 3, and columns 5 & 6 being identical to one another, with column 4 being zero.

Rows show predictions at 850hPa for 5 atmospheric variables and 4 surface fields.

For most variables, the errors are predominantly small scale, with the large scale features well captured. Instantaneous intensity of features are the correct magnitude, with no obvious damping.

Machine Learning Pilot Project – ECMWF (Open Space) > AIFS - exploratory work > media_images_val_pred_sample_rstep00_rank0_13722_84aace1486e8d643f36a.png

10 Day forecast - v850, 1deg

AIFS IFS

Verification

Forecast verification using AN to initialise and use as reference, evaluated over a 3month period. ERA5 denotes the ERA5 forecasts. IFS denotes HRES forecasts. PGUW denotes Pangu-Weather. Verification carried out using Quaver at 1.5deg resolution (default). Note, we should expect scores against observations to be worse, mostly due to the low resolution currently used.

Machine Learning Pilot Project – ECMWF (Open Space) > AIFS - exploratory work > t850.png

Choice of (re)analysis matters

AIFS is trained currently with ERA5, but initialised in the above forecasts with operational AN. The below plot shows 4 experiments, varying the data used to initial and use as truth for verification. AIFE, denotes AIFS initialise with ERA5 (EA), and AIF2 is the current AIFS initialised with AN, the model is identical for each.

For T850, there is a noticeable "advantage" of using ERA5 and analysing against ERA5, i.e. testing the model using data most similar to training. This motivates a future move towards incorporating recent analysis in training. For z500 this difference is smaller.

Machine Learning Pilot Project – ECMWF (Open Space) > AIFS - exploratory work > MicrosoftTeams-image (30).png

Exploring the model output yourself

https://apps.ecmwf.int/data-catalogues/marsscratch/?year=2022&type=fc&class=ml&stream=oper&expver=aif2

Please bear in mind, this is a preliminary model. Please do not share this data, or plots created with this data beyond ECMWF.

Other noteworthy findings

Timestepping with 24h steps, but with the same complexity model as one trained to do 6h steps (and then applied 4x) produces a model ~5-10% worse at day 3. This could be an interesting approach to train a model for extended ranges. Training to predict 3h increments produces a similarly accurate model to that of 6h.

Conclusions

Skilful models of large scale atmospheric dynamics can be learnt, even on a coarse grid.

Scaling this approach to significantly higher resolutions is possible, but requires significant engineering (in our view the computational engineering is perhaps the most impressive aspect of the GraphCast paper).

Future plans

In no particular order.

Higher resolution

Moving to N320 ERA5 grid, which will be expected to further improve the model, particular for surface parameters against SYNOP stations.

Operational analysis

Training using the last few years of analysis, which we expect to improve predictions. Similarly, the above are from 2022, whereas we only show the model data up to 2015 (by convention), Deepmind established that using very recent data improved performance.

Building ensembles

We will explore methodologies for training ensemble systems. Early investigations of Pangu using initial condition uncertainty only demonstrated that these models were underdispersive, so AIFS will need to incorporate model uncertainty.

More representation of moist processes

Prediction of precipitation and more.

Data assimilation

Explore the possible value of data-driven models in learning from observations or use in DA algorithms.

More analysis

Digging further into the existing dataset to explore how energy is distributed in the model, whether the training approach produces less variability, and more.

Extended range

Exploring the value of the model to extended range forecasts, with possible tuning for good long-range skill.

Something else?

Maybe you've got a cool idea for how to improve these approaches....

Getting involved

If you have a suggestion, feel free to suggest it in the chat below.
If you'd prefer something private, or are interested in getting involved, then feel free to message any of the contributors to the project.