Contributors :
Model development: Mihai Alexe , Simon Lang & Matthew Chantry
Data pipelining and ML tools: Baudouin Raoult , Florian Pinault & Jesper Dramsch
Analysis: Zied Ben Bouallègue & Linus Magnusson
Thanks also to Florence, Florian, Andy, Irina, Matthieu, Peter D, Chris K, Victoria and more for their support and encouragement.
ECMWF will be exploring the use of machine learning to make forecasts, particularly targeting skilful and reliable ensembles. The working name for this machine learned forecast model is AIFS. To prepare for this work, the above staff have contributed some of their time to do some exploratory work. The objectives were to better understand the challenges in constructing a machine learnt deterministic forecasting system with comparable skill to Pangu & GraphCast. For more on the verification of data-driven (ML) forecast models please explore these pages.
We have chosen to explore the Graph Neural Network approach used by Deepmind in GraphCast, due to it's natural representation of the sphere and ease of using custom grids.
Intro to GNNs https://distill.pub/2021/gnn-intro/
GNNs are comprised of small neural networks attached to edges and nodes on a graph. These NN process information on the edges and nodes and combine them to give a prediction for the nodes at the next timestep. These NN are applied to every node/edge, meaning that the model is naturally translationally invariant. The model therefore learns a general representation of the physical processes happening globally, rather than trying to learn a bespoke model for each area of the globe. Information such as orographic information breaks that invariance, but only via dependencies on these local variables.
So far we have deviated in the following ways from GraphCast:
"Processor Graph / grid", where the majority of calculations take place. Light colour nodes have longer-range connections to aid propagation of information over larger distances.
"Encoder Graph", which maps from ERA5 (small black dots) to the "Processor grid" (blue dots).
"Decoder Graph", which maps from the "Processor grid" (red dots) back to the ERA5 grid (small black dots).
Example of a 6h prediction for the model. Columns show:
Input, target, GNN prediction, GNN prediction - target, GNN prediction - input, target - input
A perfect model would should columns 2 & 3, and columns 5 & 6 being identical to one another, with column 4 being zero.
Rows show predictions at 850hPa for 5 atmospheric variables and 4 surface fields.
For most variables, the errors are predominantly small scale, with the large scale features well captured. Instantaneous intensity of features are the correct magnitude, with no obvious damping.
10 Day forecast - v850, 1deg
AIFS IFS
Forecast verification using AN to initialise and use as reference, evaluated over a 3month period. ERA5 denotes the ERA5 forecasts. IFS denotes HRES forecasts. PGUW denotes Pangu-Weather. Verification carried out using Quaver at 1.5deg resolution (default). Note, we should expect scores against observations to be worse, mostly due to the low resolution currently used.
AIFS is trained currently with ERA5, but initialised in the above forecasts with operational AN. The below plot shows 4 experiments, varying the data used to initial and use as truth for verification. AIFE, denotes AIFS initialise with ERA5 (EA), and AIF2 is the current AIFS initialised with AN, the model is identical for each.
For T850, there is a noticeable "advantage" of using ERA5 and analysing against ERA5, i.e. testing the model using data most similar to training. This motivates a future move towards incorporating recent analysis in training. For z500 this difference is smaller.
Please bear in mind, this is a preliminary model. Please do not share this data, or plots created with this data beyond ECMWF.
Timestepping with 24h steps, but with the same complexity model as one trained to do 6h steps (and then applied 4x) produces a model ~5-10% worse at day 3. This could be an interesting approach to train a model for extended ranges. Training to predict 3h increments produces a similarly accurate model to that of 6h.
Skilful models of large scale atmospheric dynamics can be learnt, even on a coarse grid.
Scaling this approach to significantly higher resolutions is possible, but requires significant engineering (in our view the computational engineering is perhaps the most impressive aspect of the GraphCast paper).
In no particular order.
Moving to N320 ERA5 grid, which will be expected to further improve the model, particular for surface parameters against SYNOP stations.
Training using the last few years of analysis, which we expect to improve predictions. Similarly, the above are from 2022, whereas we only show the model data up to 2015 (by convention), Deepmind established that using very recent data improved performance.
We will explore methodologies for training ensemble systems. Early investigations of Pangu using initial condition uncertainty only demonstrated that these models were underdispersive, so AIFS will need to incorporate model uncertainty.
Prediction of precipitation and more.
Explore the possible value of data-driven models in learning from observations or use in DA algorithms.
More analysis
Digging further into the existing dataset to explore how energy is distributed in the model, whether the training approach produces less variability, and more.
Exploring the value of the model to extended range forecasts, with possible tuning for good long-range skill.
Maybe you've got a cool idea for how to improve these approaches....
If you have a suggestion, feel free to suggest it in the chat below.
If you'd prefer something private, or are interested in getting involved, then feel free to message any of the contributors to the project.