# Section 2.5 Model Data Assimilation, 4D-Var

## Model Data Assimilation, overview

Data assimilation is an analysis technique in which the observed information is accumulated into the model state.

### Approaches to Data Assimilation

There are three basic approaches to data assimilation:

• sequential assimilation – only considers observation made in the past until the time of analysis (real-time assimilation systems).
• non-sequential – considers observations made before and after the nominal time of the analysis (e.g. 4D–var).
• retrospective assimilation, where observation from the future can be used (e.g reanalysis).

Another distinction can made between methods that are intermittent or continuous in time:

• Intermittent method – observations can be processed in small batches.  The correction to the analysed state tends to be abrupt and physically less realistic.
• Continuous method – observation batches over longer periods are considered.  The correction to the analysed state is smoother in time and physically more realistic (e.g. Long Window Data Assimilation).

The four basic types of assimilation are depicted schematically in Fig2.5-1.  Compromises between these approaches are possible.  The aim is to assimilate data in a manner which does not produce sudden jumps in analysed values and some sort of continuous assimilation seems preferable.  It is expensive in computing time and a compromise has been adopted with 4D-Var assimilating data observed at various times over several hours. This avoids sudden jumps at in analyses for the forecasts (e.g. 00UTC, 12UTC).  Continuous Assimilation is used during 4D-Var analysis process for the early cut-off analysis.

Fig2.5-1: Representation of four basic strategies for data assimilation as a function of time.   Observations are made at different times and arrive at irregular times.  Intermittent assimilation of observations induces step changes in the model analysis (red line) while continuous assimilation of observations gives smoother changes in the model analyses.

### Quality Control

Data is quality controlled using constraints of:

• consistency of information from the observing platform,
• assessment whether changes with time are realistic when compared with model expectations,
• consideration of the physical properties, both actual and implied.

Before assimilating data it is necessary to quality control the observations during data extraction process:

1. Data Extraction
• Thinning (to reduce over-emphasis towards values in a small area)
• Check out duplicate reports (to remove over-emphasis of the observation)
• Ship tracks check (to ensure observations made at a reasonable location)
2. Hydrostatic check
• Some data is not used to avoid over-sampling and correlated errors.
• Departures and flags are still calculated for further assessment.
3. Blocklisting
• Data skipped due to systematic bad performance or due to different considerations (e.g. data being assessed as unreliable, inconsistent or misleading).
4. Model/4D-Var dependent quality control
• First guess based rejections
• 4D-Var quality control rejections
5. The Analysis

### Outline of the analysis process

The analysis process seeks to realistically represent in the model the actual state of the atmosphere.  However, inconsistencies in time and space of the observations mean that this aim will never be actually attained.  The best that can be done is to approximate (hopefully closely) the actual state of the atmosphere while maintaining the numerical stability of the model atmosphere both horizontally and vertically.  The assimilation process is carried out by 4D-Var (see below).  In simple terms the previous analysis step has used model processes (e.g. dynamics, radiation, etc) to reach a forecast first guess value at a given location.  This will usually differ from an observation at that location and time. The difference between them is the "departure".  The analysis scheme now adjusts the value at the location towards the observed value while retaining stability in the model atmosphere. This adjustment is the "Analysis Increment".  If the magnitude is large it suggests the model is not capturing the state of the atmosphere well (e.g. a jet is displaced, a trough is sharper, large scale active convection has not been completely captured).  However, a large Analysis Increment may also suggest poor data.  Analysis Increment charts are a powerful tool for identifying areas of uncertainty which might propagate downstream.

Fig2.5-2: Schematic of the data assimilation process (from a diagnostic perspective).  All the model forecast parameters (dynamics, radiation, etc.) are used in the model forecast to deliver the first guess forecast (red).  The observations (grey) however will normally differ to a greater or lesser extent from the first guess forecast (red).  The analysis increment (yellow) is evaluated to bring the the evolution more into line with the observations while maintaining model stability.  The resultant value (blue) becomes the first guess for the next analysis.

## Model Data Assimilation, 4D-Var

The four-dimensional variational analysis (4D-Var) system uses an optimisation procedure to adjust the initial condition to obtain:

• an optimal fit of all the observations in the assimilation time window and
• at the same time tries to stay as close as possible to the first guess.

4D-Var uses the concept of a continuous feedback between observations and IFS model data analysis.  The impact of the observations is determined by three considerations:

• the assumed accuracy of the observations.  These are considered more or less static.  However, consistently poor observations can be blacklisted.
• the representativeness of the observation relative to the IFS model's depiction.  For example, this takes into account:
• the differences between the altitude of the observation and the altitude of the corresponding point in the IFS orography.
• the differences between the location of the observation and the grid point).
• the accuracy of the short-range forecasts, which are flow-dependent.  Hence the uncertainty may be larger in a developing baroclinic depression than in a subtropical high-pressure system.

The 4D-Var analysis uses the IFS dynamics and physics to create a sequence of states that fits as closely as possible to the available observations.  These states are consistent with the dynamics and physics of the atmosphere, as expressed by the equations of the IFS model.  4D-Var compares the actual observations with that which would be expected at the time and position of the observation given the model fields.  It is in effect a short-range forecast that serves to bring information forward from the previous cycle.

The analysis essentially uses data within a time window, currently 09–21UTC for the 12UTC model run; 21–09UTC for the 00UTC model run.  In practice the main forecasts of the ensemble use the data cut-off (i.e. the last time) brought back to 15UTC (for the 12UTC run) and 03UTC (for the 00UTC run).  This is in order to be able to deliver forecasts to customers in a timely manner (see the continuing sequence of analyses).  All observational data are processed similarly, including radiances from satellites.  A weakly coupled sea-ice atmosphere assimilation is used in the surface analyses of the 4D-Var, and in the ensemble of data assimilations (EDA).  Before the introduction of cycle 45r1 in June 2018  a remotely-generated sea ice cover analysis (OSTIA) had been used directly.

4D-Var at ECMWF is based on an incremental and iterative approach to minimising a cost function  - in effect minimizing the departure of the final adjusted analysis from the observed values and the last available short range forecast.  This takes place within a set of nested loops.  The inner loop has low spatial resolution (TL255 L137) and produces preliminary low-resolution analysis increments using full linearised physics (except the first inner loop of the EDA).  By iterating forwards and backwards in time the analyses can be adjusted towards the observations and induce consistent adjustments with other variables.  Subsequent loops are at higher resolution (TL319 L137 and TL399 L137) with the same full linearized physics.  This incremental approach provides considerable flexibility in the use of computer resources.

The resulting adjustment of a variable generates physically and dynamically consistent corrections of other variables.  Thus a sequence of observations of humidity from a satellite infrared instrument that shows a displacement of atmospheric structures will entail a correction not only of the moisture field but also of the wind and temperature fields.  For satellite radiances the variational scheme modifies the fields of temperature, wind, moisture and also ozone in such a way that the simulated observations are brought closer to the observed values.

To estimate the flow-dependent uncertainty, a set of 3-hour forecasts, valid at the start of the 4D-Var time-window, is computed from 25 perturbed, equally likely analyses. Small variations are imposed on the observations and the sea surface temperature to reflect uncertainties, and also within the error parameterisation to cope with uncertainties in the forecast evolution.  The perturbations produced using this Ensemble of Data Assimilations (EDA) are also used for the construction of the perturbations in the forecast ensemble.

Fig2.5-3: The ECMWF 4-dimensional data assimilation system determines a correction to the background initial condition (blue line) that leads to an analysis that is somewhere "midway" between background and observations. In simplest terms the analysis is a weighted mean of the background and observations.

Fig2.5-4: Each observation has an error (instrumental, representativeness, etc) and error within the IFS forecast models is also taken into account.  A way to simulate both these effects is to run an Ensemble of Data Assimilation (EDA). These are shown in green.

### Analogies to 4D-Var and the EDA

Suppose a forecaster wants to create a sequence of manually analysed hourly synoptic charts for their region, which evolve smoothly, continuously and realistically from one time to the next, throughout.  To achieve this one approach would be to draw up one chart, using all available data, surface observations, imagery etc, then another for a subsequent time in the same way. Then the forecaster might go back to the first chart, rub something out and re-draw it in a rather different fashion that still fits the available observations reasonably well, but  that allows the next chart to follow on better.  Then they might draw up other charts for other times, and repeat this process many times, rubbing out and re-drawing, probably all of the charts in some way or other, to achieve the final goal of sensible continuity.  Each time a chart is readjusted the changes needed will become smaller and smaller, until finally the forecaster is happy that they have a full and consistent sequence.  This is the forecaster's equivalent of 4D-Var.  Of course in 4D-Var there is the additional constraint of full vertical consistency also, though in the forecaster world soundings and imagery may indeed be contributing in an analogous way.

Meanwhile the Ensemble of Data Assimilation (EDA) could be thought of as being equivalent to the above process, but activated several times producing slightly different smooth and continuous but equally probable sequences.  Where there are lots of observations the several sequences might end up almost the same, but where data is sparse one could end up with much more variability between sequences.  And where there is large dynamic instability (e.g. a developing frontal wave rather than the centre of an anticyclone) this 'spread' might be even greater.  In the real EDA this is essentially also what happens; more spread is found amongst members in data sparse areas and when there is larger innate uncertainty in actual atmospheric developments.

### Additional Sources of Information

(Note: In older material there may be references to issues that have subsequently been addressed)