Section 2.1.6.1 Machine Learning

Machine Learning

The aim of ML is to continually develop (train) an empirical model directly from observations or reanalyses. Observations implicitly contain the physics of the atmosphere but it is not necessary for ML models emulate the underpinning physics that dictates the evolution of variables through a forecast. During the ML training process, ML considers all the set of observed or initial data, and using statistical methods relates these to observed variable (e.g.temperature) six hours later at each point. The initial data and corresponding data at the end of the forecast period have been extracted from some 40 years of ERA5 data. At ECMWF, machine learning training is aimed towards producing six hour forecasts. Table1 gives the set of observed and forecast variables and the constants considered during the machine learning process at ECMWF.

Process

At each grid point the set of observed data is processed using the set of random weighting functions for each parameter. Initially the forecast value will not agree with those observed at the verifying time of the forecast. The error (loss function) as measured by some error metric is fed backwards (back propagation) within the process. In response, the influence of types of observations (say wind, 50hPa temperature, etc.) may be reduced while that of others (say surface temperature) may be increased. This process is repeated many times with the aim to progressively minimise the error metric (See Fig3).

The process incrementally improves the relationship between the set of initial observations and forecast values of a single variable for the later time. In this way a relatively simple relationship between initial data and forecast data for six hours in the future is gradually built up. This consists of probabilities of influence of each meteorological parameter in the form of a weighting for each input data type. Taking all the weighting functions together forms an algorithm for use during the AI forecasts.

ML when completed returns weights for all variables that give the best forecast at T+6. These weights are different according to the variable being forecast (e.g. the weight given to surface pressure used in calculating a forecast surface temperature is different from weight given to surface pressure used in calculating a forecast surface dew point). These relationships are in the form of a set of algorithms that can be used by subsequent AI forecasts.

Error loss metrics

AIFS Single:

The error metric (loss function) is RMS.

Fig3: Machine learning training process in deriving an algorithm for use in AIFS single. A range of observed data is processed using random weighting functions for each parameter. The forecast results are then compared with verifying data and the difference between them (the loss or error) is fed back to the processor (back propagation). This induces modification of the weighting functions for each parameter and the resulting forecast is compared with the verifying data giving new loss/error value. Iteration continues until the loss/error is minimised and the set of weights for each parameter becomes the algorithm for the forecasting process.

AIFS-ENS:

The error metric (loss function) is the almost fair CRPS. For the ensemble, it is necessary also to include some form of uncertainty during evaluation of the algorithms. To do this, white noise is injected into the neural network during the training phase. The model learns to shape this noise to capture the uncertainty in future weather conditions, so that in the forecast phase when new white noise is injected, the model can create a well-calibrated ensemble.

The algorithms for the ensemble are developed rather differently from those for use by AIFS single. A small ensemble group (ECMWF uses a group of 4) is used that give information on the variability of these independent results and to introduce a measure of model uncertainty. At each grid point the set of observed data is processed using four different sets of random weighting functions for each parameter. The four forecast results are then compared with verifying data. CRPS, which measures how good forecasts are, is then evaluated for the results of the four forecasts. The CRPS influences what is fed back to the ML processors (back propagation). This induces modification of the weighting functions for each parameter and the resulting forecasts are compared with the verifying data giving new CRPS values. Iteration continues until the CRPS is minimised and the set of weights for each parameter forms the algorithm for the ensemble forecasting process (See Fig4). Using CRPS as a loss function accounts for the limitations of using a finite number of ensemble members, and ensures an accurate and well-callibrated distribution. Model uncertainty is incorporated as a learnt aspect due to the insertion of white noise.

Fig4: Machine learning training process in deriving an algorithm for use in AIFS-ENS. A range of observed data is processed using four different sets of random weighting functions for each parameter. White noise is introduced to each iteration to emulate model uncertainty. The four forecast results are then compared with verifying data. CRPS, which measures how good forecasts are, is evaluated for the results of the four forecasts. The CRPS influences what is fed back to the processor (back propagation). This induces modification of the weighting functions for each parameter and the resulting forecasts are compared with the verifying data giving new CRPS value. Iteration continues until the CRPS is minimised and the set of weights for each parameter becomes the algorithm for the forecasting process.

Considerations

The ML training process uses a vast amount of data and is very expensive in computer time and energy. However, the process is executed only once to develop the observation-to-forecast relationships used subsequently by AIFS. At rare intervals ML may be repeated to include further data types or different techniques etc. In effect ML uses previous evolutions of similar analyses at several levels to “learn” the more likely evolution. Although the ML model has no need to incorporate physical processes during the learning process it does provide some insights and seems to be learning some physics.

Strengths of ML are:

it determines relationships between input observations and output variables directly from data.
it can extract information from very large data sets.
there is no need to explicitly simulate the physics of the atmosphere.
there is no need for comprehensive understanding of physical theory.
evaluation measures the performance of a machine learning model.

Weaknesses of ML are:

it requires a vast amount of training data.
iterations in deriving the input to output relationship are complex, time consuming and expensive.
the whole ML process must be repeated if further meteorological parameters are added.

Space shortcuts

Page tree