Contributors: Leopold Haimberger (UNIVIE), Michael Blaschek (UNIVIE), Federico Ambrogi (UNIVIE), Ulrich Voggenberger (UNIVIE)

Issued by: UNIVIE / Leopold Haimberger

Date: 14 July 2021

Ref: C3S_ DC3S311c_Lot2.2.1.3_202107_uncertainties_final_assessment

Official reference number service contract: 2019/C3S_311c_Lot2_UNIVIE/SC1

Table of Contents

1. Introduction

In this report we describe the implementation of the uncertainties of the temperature, humidity, and wind speed data, based on Desroziers' statistical method, in the Early Upper Air database v2.
The database represents the most comprehensive collection of upper air measurements, either via radiosondes or balloons. More than 4800 station files make up the latest version of the database, including data as early as the beginning of 20th century. The files are in netCDF format, one per station, and use a group structure that follows the requirement of the Common Data Model for Observations (https://github.com/glamod/common_data_model/), where each table is saved as a separate group.
The original data, collected from multiple archives, was extended by this contract to include also bias estimates.

In this document we address the problem of observation uncertainties. Not only early, but also recent upper air measurements do not systematically provide uncertainty estimates. Only dedicated baseline networks such as the Global Climate Observing System (GCOS) Reference Upper-Air Network (GRUAN) provide such information. Another possible way to assess observation uncertainties are radiosonde intercomparison campaigns. This service helped rescuing and digitizing several historical data collected during those campaigns, see for example the reports DC3S311c_Lot2.1.3.1 and DC3S311c_Lot2.2.3.2. To make use of such error estimates, the information about the specific type of instruments utilized is fundamental.

For this service, we instead rely on a more general framework to assess uncertainties which makes use of background and analysis departure information, which is calculated during the production of reanalyses. A statistical procedure developed by Desroziers' (2005, 2011) allows to estimate the observation uncertainties, which include both the measurement and the representation errors. The theoretical basis and the motivations were extensively described in documents DC3S311c_Lot2.2.1.1 (temperature and humidity) and DC3S311c_Lot2.2.2.1 (wind variable). Here we rather focus on the implementation of the algorithm and the creation of a dedicated group inside the station files.

2. Uncertainty Estimation with Desroziers' Method

We summarize here the statistical method, developed by Desroziers' (2005, 2011) to estimate the observation uncertainties based on analysis departures and background departures.
We will refer to this procedure throughout this document as "Desroziers diagnostics". We let dO-B be the observation minus background departure and dO-A the observation minus analysis departure. We then define the error cross covariance matrix R as the time-average:

$$E[\bf{d^0_{a}}(\bf{d^0_{b}})^T] = \bf{R}$$

Equation 1: Cross Covariance matrix definition. Only the diagonal values are usually interpreted as expectations of observation errors.

where the expectation value is calculated over an arbitrary time range (we use time window of 1,2,3 and 6 months). We note that the value of the cross covariance thus depends on the chosen time window. For each pressure level, the quantities dO-B and dO-A are available and the estimate of the error can be calculated from the time averages. Note that the result is valid if dO-A and dO-B are unbiased. However, results with biased departures did not differ much from the ones obtained with bias corrected data (Waller et al., 2016). The calculation of observation errors is optimally implemented at the time of assimilation, in an iterative way (Janjić et al., 2018). However, Desroziers et al. (2005) showed that already after the first iteration the diagnostics gives good estimates of the observation-error variances and correlations. Only the first iteration can be calculated offline outside the assimilation framework, and this first iteration is implemented for this service.

3. Implementation

Here we briefly describe the implementation of the algorithm that calculates the uncertainties, following the Desroziers' method. The code is stored and publicly available at https://github.com/MBlaschek/CEUAS/tree/master/CEUAS/public/postprocess/add_desrozier.

The code is structured as follows. First of all, the basis of the calculation is a running mean of arbitrary time window, that we fixed at 30, 60, 90 and 180 days. This means that, to calculate the uncertainties for a specific day using the 30 day time window, we need to extract the departures for the 15 days preceding and the 15 days following the observation. This forces us to make an approximation, since most likely the time stamp of the observations will not be identical, and we necessarily have to adjust the time of observations that we consider. Typically the measurements have arbitrary time stamps, and are not reported at fixed times. For this reason, we allow a time window of 2 hours between the time of each specific observations to look for and four standardized times i.e. 00,06,12 and 18 hrs, so that e.g. measurements collected from 22:00 to 02 a.m. of the following days are all mapped to midnight, measurements collected from 4 a.m. to 8 a.m. are mapped to 6 a.m. etc. This choice leaves out data collected at 3,9,15 and 21, but we preferred to privilege a more conservative choice for the allowed time delta.

Beside the standardisation of observation time stamps, we also consider only observations on 16 standard pressure levels.

At this point it is easy to extract the list of analysis and background departures from our data files, included in the desired time window, centred around the date being analysed.

When we have obtained the list of analysis and background departures, we remove the values that are smaller than first quartile minus the interquartile range and larger than the third quartile plus the interquartile range. As a last criterion, we require that at least 50% of the data for the time window is available, for example there must be 15 departure values available when the time window is set to 30 days.

Finally, we can calculate the cross covariance, whose squared root represents our desired observation uncertainty. As we previously observed, one of the assumption at the basis of the validity of the Desroziers' method is that data is unbiased. For this reason, we only calculate the uncertainty if bias correction is available for temperature. For the other variables we have no bias information, and the bias is assumed negligible.

The calculated uncertainties are then stored in the netCDF files in a dedicated group called "advanced_uncertainties". There are two different variables defined per time window; for example for the 30 days window, we have the variable "desroziers_30" which contains the value of the uncertainty, while the variable "num_30" contains the number of observations that have been used to calculate the uncertainty. This is useful, for example, to verify why the uncertainty might not be available, i.e. a not-a-number NaN is found. It might happen, indeed, that for the time window 30 days, the requirement of at least 50% measurements, in this case 15, is not satisfied, and a nan is found as value of the uncertainty, but at the same time we might find a real value for a larger time window, where the 50% requirement is satisfied.
The other possibilities to find NaNs include the absence of bias information for temperature or the lack of departures, so that it is impossible to use this method.

4. Results

Here we provide an overview of the results for the uncertainties calculated for the temperature, relative humidity and wind direction (although also uncertainties for the decomposed u and wind-components are available). We take at first the observing station Lindenberg (WMO code 10393) as an example, and then move to a more global overview of the results.

We start by analysing the time series of uncertainties of the relative humidity, air temperature and wind speed in Figure 1, Figure 2 and Figure 3, displayed for all the standard pressure levels. The uncertainties are displayed in different colours according to the different time window used for the calculation of the averages. The peaks of higher uncertainties are due to large values in the departure values. It is then interesting to compare uncertainties of different stations, which might fluctuate due to the accuracy of the reanalyses. We note that there is no neat dependency on the pressure levels, and most of the uncertainties are distributed around an average values. The estimates of uncertainties become smaller with time, as can be expected by the improved accuracy of the sensors used for upper air measurements.
We also note that the variance of the uncertainties decreases as the size of the time averaging window increases, since the values are less affected by statistical fluctuations of the values of the departures. This is emphasized by plotting first the 30 day averages and the 180 day averages at last.

Figure 1: Time series for the relative humidity variable (dimensionless) for station Lindenberg. Colored lines correspond to the values of the uncertainties, and the corresponding scale is shown on the right y-axis. The gray shaded lines are the actual observation values and the corresponding scale is shown in the left y-axis . Only the series for pressure level greater than 250hPa are shown, since the measurements are not sufficiently accurate at higher elevation, except in the most recent years (2000s onward).

Several other interesting patterns can be found in the estimates that deserve further research, for example the temporary increase of temperature and wind uncertainties at this station at the jet level (ca 200-300 hPa) in the late 1960s and early 1970s. Certainly such diagnostics can help tracing error sources which are hard to detect from the observations time series alone.

We note that reanalyses information for observations before 1950 is available only from surface data only reanalysis (we used NOAA-20CRv3 reanalyses, Sivlinksy et al. 2019). While these have not assimilated upper air observations, we still could interpolate from the reanalysis fields to the observation locations to get offline departures from that reanalysis. While useful, these are not sufficient to calculate Desroziers diagnostics. One really needs to assimilate the upper air data to get both analysis and background departures and to fulfil the necessary conditions for valid Desroziers diagnostics.

Figure 2: Time series for the air temperature variable (units K). Colored lines correspond to the values of the uncertainties, and the corresponding scale is shown on the right y-axis. The gray shaded lines are the actual observation values and the corresponding scale is shown in the left y-axis .

 However, Desroziers et al. (2005) showed Figure 3: Time series for the wind speed variable (units m/s). Colored lines correspond to the values of the uncertainties, and the corresponding scale is shown on the right y-axis. The gray shaded lines are the actual observation values and the corresponding scale is shown in the left y-axis .

In Figure 4, Figure 5 and Figure 6 we show the distributions of the uncertainty values, always for individual pressure levels, for the different variables. We find typical values of around 0.05 for the relative humidity, 0.6 K for the Temperature and around 1 m/s for the wind speed. We note that the uncertainty decreases in time, which is of course a good sign, with a different extent depending on the specific station considered. Finally, that the larger the time window considered, the narrower the distributions since the average uncertainty gets smaller.

Figure 4: Histograms of uncertainty estimates for the variable relative humidity, for standard pressure levels equal or larger than 250 hPa for station Lindenberg. Color coding is the same as in Figs. 1-3 for the different averaging intervals.

Figure 5: Histograms of uncertainty estimates for the variable air temperature, for each standard pressure level.

Figure 6: Histograms of uncertainty estimates for the variable wind speed, for each standard pressure level.


Figure 7: Contours of the uncertainties for the air temperature (left) and wind speed (right), for all the available standard pressure levels at station Lindenberg. 925 hPa was not a standard pressure level before the early 1980s, thus the visual artifact in the lower left corner of the panels.

In Figure 7 we summarize the values of the uncertainties for the air temperature and the wind speed, for the station Lindenberg. Note that the plots are produced by interpolating over the data for the various pressure levels so spurious effects can happen when portion of data is missing, e.g. for the bottom left part of the wind plot, and also by the choice of the colour grid in the bar. The time window chosen to calculate the average is 180 days. In general, we note the same trend as seen in the time series, i.e. more recent values tend to have smaller values of the uncertainties, which is ~0.3 K for the air temperature and ~1 m/s for the wind speed.

As we described in the previous section, we calculated the uncertainties for standard pressure levels, and referring the observation time stamps to four standardized hours: 00, 06, 12 and 18. In principle we should report the results for those specific times; however, we did not find any significant difference between daytime and night-time estimates (see Fig. 8). This justifies our choice to show combined values for the uncertainties at different times.

Figure 8: Time series for the air temperature Desroziers' uncertainty, calculated at 00, 06, 12 and 18 hours, at station Lindenberg. After 2013 the data from Lindenberg are reported in BUFR format at actual launch time, which is rarely exactly at the hour. Thus there appear to be fewer values after 2013.

In Figure 9 we show a global overview of average uncertainties for the available stations in the database, calculated over periods 1970-1980 and 2000-2020, for the relative humidity, air temperature and wind speed, at a pressure level of 850 hPa. We can clearly see that the average values decrease for more recent observations for all the variables, and that also the number of stations which exhibit large uncertainties tend to decrease in more recent time periods. We also note a cluster of stations with quite large uncertainties in the temperature and wind speed over India. Those stations are well known for their limited accuracy.

Figure 9: Overview of the average values of the Desroziers' uncertainty, calculated using a time window of 30 days, in the period 1970-1980 (left) and 2000-2020 (right), at 850 hPa pressure. Top panels show the uncertainties of the relative humidity, middle panels show the uncertainty of the air temperature, and bottom panels of the wind speed.

5. Conclusions

In this document we reported about upper air observation uncertainty estimates, calculated according to the method developed by Desroziers (2005). No instrumentation metadata are needed; the calculation depends exclusively on the analysis and background departure information as available from ERA5. The estimates are considered a key feature of the database v2 of this service. As we already stressed, they contain both measurement and representation errors.

The uncertainties have been calculated and stored in v2 of the database, which is soon expected to be available for the users on the Copernicus Data Store. Retrieval of the uncertainty estimates is described in the Product User Guide (DC3S311c_Lot2.1.5.2).

6. References

Desroziers, G.: Observation error specification, ECMWF Annual Seminar 2011 [online] Available from: https://www.ecmwf.int/node/14958, 2011.

Desroziers, G., Berre, L., Chapnik, B. and Poli, P.: Diagnosis of observation, background and analysis-error statistics in observation space, Quarterly Journal of the Royal Meteorological Society, 131(613), 3385–3396, doi:10.1256/qj.05.108, 2005.

Janjić, T., Bormann, N., Bocquet, M., Carton, J. A., Cohn, S. E., Dance, S. L., Losa, S. N., Nichols, N. K., Potthast, R., Waller, J. A. and Weston, P.: On the representation error in data assimilation, Quarterly Journal of the Royal Meteorological Society, 144(713), 1257–1278, doi:10.1002/qj.3130, 2018.

Kitchen, M.: Representativeness errors for radiosonde observations, Quarterly Journal of the Royal Meteorological Society, 115(487), 673–700, doi:10.1002/qj.49711548713, 1989.

Slivinski, LC, Compo, GP, Whitaker, JS, et al. Towards a more reliable historical reanalysis: Improvements for version 3 of the Twentieth Century Reanalysis system. Q J R Meteorol Soc. 2019; 145: 2876– 2908.https://doi.org/10.1002/qj.3598

Waller, J. A., Ballard, S. P., Dance, S. L., Kelly, G., Nichols, N. K. and Simonin, D.: Diagnosing Horizontal and Inter-Channel Observation Error Correlations for SEVIRI Observations Using Observation-Minus-Background and Observation-Minus-Analysis Statistics, Remote Sensing, 8(7), 581, doi:10.3390/rs8070581, 2016.

This document has been produced in the context of the Copernicus Climate Change Service (C3S).

The activities leading to these results have been contracted by the European Centre for Medium-Range Weather Forecasts, operator of C3S on behalf of the European Union (Delegation agreement signed on 11/11/2014). All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose.

The users thereof use the information at their sole risk and liability. For the avoidance of all doubt , the European Commission and the European Centre for Medium - Range Weather Forecasts have no liability in respect of this document, which is merely representing the author's view.


7. Related articles