Last modified on May 20, 2025 09:01

Contributors: Hendrik Boogaard (WAGENINGEN ENVIRONMENTAL RESEARCH), Allard de Wit (WAGENINGEN ENVIRONMENTAL RESEARCH), Jenny Lazebnik (WAGENINGEN ENVIRONMENTAL RESEARCH), Jonathan Schubert (METEOGROUP), Gerald van der Grijn (METEOGROUP)

Table of Contents

History of Modifications

Version	Date	Description of modification	editor
0.9	6 May 2019	Full draft	Hendrik Boogaard
1.0	18 may 2019	Review and minor edits	Ronald Hutjes
1.1	Sept 2020	Updated dataset temporal coverage due to dataset update	ECMWF
2.0	May 2025	Updates for AgERA5 v2.0	Allard de Wit

Acronyms

Acronym	Description or definition
AgERA5	Daily surface meteorological data set for agronomic use, based on ERA5
CDS	Climate Data Store (of ECMWF)
ECMWF	European Centre for Medium Range Weather Forecast
ERA5	ECMWF Re-Analysis
HRES	High Resolution Forecast
JRC	Joint Research Centre of the European Commission
LT	Local Time
MARS	Monitoring Agricultural ResourceS
NN	Nearest Neighbor

1. Scope of the document

This document provides an overview of the AgERA5 product, the underlying data sets, the underlying algorithms and workflow. The AgERA5 dataset provides daily, agronomic relevant, meteorological data for the period 1979 to present at a spatial resolution of 0.1° .

2. Executive summary

The AgERA5 dataset provides daily surface meteorological data for the period 1979 to present at spatial resolution of 0.1° grid. The service is based on the fifth generation of ECMWF atmospheric re-analyses of the global climate, better known as ERA5. AgERA5 'connects' users in the agricultural domain to the new ERA5 data set. It includes daily aggregates of agronomic relevant variables, tuned to local day definitions and adapted to the finer topography, finer land use pattern and finer land-sea delineation of the ECMWF HRES operational model. The variables cover temperature, precipitation, snow depth, humidity, cloud cover and radiation.

3. Product description

The following text applies to AgERA5 versions 1.0 and v1.1. Extensions were added for v2.0 which have been marked as such.

3.1. Introduction

Climate forcing data is used in analysis and agro-environmental modelling to study aspects of productivity and externalities of agriculture (e.g. Toreti et al, 2019; Glotter et al., 2016; De Wit et al., 2010). In this service we start from the hourly ECMWF ERA5 model data and convert the data into meaningful input for these analyses and modelling. It involves a large amount of data that needs to be processed. Acquisition and pre-processing of ERA5 data, both archive and near real- time (NRT) data, is a large and specialized job. It requires a heavy investment for users like technical policymakers, information agencies, NGOs, commodity traders, agri-businesses, insurance providers etcetera. The complex task and required effort may even be a barrier to start using the data.

This service is based on the original hourly deterministic ECMWF ERA5 data, at surface level and available at a spatial resolution of 30 km (~0.28125°). Data were aggregated to daily time steps and corrected towards a finer topography at a 0.1° spatial resolution. Aggregated data at daily time steps follow a local time zone definition and include a number of major agronomic parameters. The correction to the 0.1° grid was realized by applying grid and variable-specific regression equations to an ERA5 data set interpolated at 0.1 ° grid. The equations were trained on operational ECMWF HRES model data at a 0.1° resolution. The final data set is referred to as AgERA5. AgERA5 users will save potential users money and stimulate businesses in using such high quality data set. It avoids a possible proliferation of different data sets, originating from the basic hourly ERA5 data set.

3.2. Differences between AgERA5 version 1.0/1.1 and version 2.0

The changes with the update from AgERA5 version 1.1 to version 2.0 can be summarized as follows;

Bug fixes were introduced that solve some of the issues identified in the previous versions. See the "known issues in v1.1" section in the product user guide. These fixes include:
- Fixing of rogue values in the temperature layers
- Fixing inconsistent temperature values (Tmean > Tmax). This problem was largely solved by retrieving all daily temperature statistics from the same ERA5 variable 't2m' (see Table 3-1.)
- Striping in the relative humidity layers was partially resolved by adding new relative humidity layers providing the daily minimum and maximum relative humidity.
Five new variables were introduced that were requested by users:
1. The Penman-Monteith reference evapotranspiration;
2. the vapour pressure deficit at daily maximum temperature;
3. the daily minimum relative humidity;
4. the daily maximum relative humidity;
5. The precipitation duration fraction.
Minor updates related to metadata such as variable names and coordinate reference system.

3.3. Variable definitions

The AgERA5 includes 22 (v1.0 and v1.1) or 27 (v2.0) agronomically relevant variables. See Table 3-1.

The meaning of the columns in table 3.1 is:

Short name: Name of the variable as encoded in the filename of the variable
Long name: an explanation of the variable
Unit: the unit in which this variable is provided
Aggregation: the type of day aggregation function that is applied to obtain this variable
The URI of this variable in the AgroVOC Thesaurus: https://aims.fao.org/agrovoc
in V2: indicates if this variable is include in AgERA5 v2 (Y) or not (N).

Table 3-1:List of variables in the AgERA5 data set

Short name	Long name	Unit	Aggregation	AGROVOC URI	in V2
Cloud_Cover_Mean	Total cloud cover (00-00LT)	(0 - 1)	Mean	http://aims.fao.org/aos/agrovoc/c_1681	N
Dew_Point_Temperature_2m_Mean	2 meter dewpoint temperature (00-00LT)	K	Mean	http://aims.fao.org/aos/agrovoc/c_2231	N
Precipitation_Flux	Total precipitation (00-00LT)	mm d-1	Sum	http://aims.fao.org/aos/agrovoc/c_6161	N
Precipitation_Rain_Duration_Fraction	Precipitation type duration - rain (00-00LT)	-	Count/24		N
Precipitation_Solid_Duration_Fraction	Precipitation type duration - solid fraction (no hail) composed of: precipitation types freezing rain (3), snow (5), wet snow (6), mixture of rain and snow (7) and ice pellets (8) (00-00LT)	-	Count/24		N
Relative_Humidity_2m_06h	Relative humidity at 06LT	%	-	http://aims.fao.org/aos/agrovoc/c_6496	N
Relative_Humidity_2m_09h	Relative humidity at 09LT	%	-	http://aims.fao.org/aos/agrovoc/c_6496	N
Relative_Humidity_2m_12h	Relative humidity at 12LT	%	-	http://aims.fao.org/aos/agrovoc/c_6496	N
Relative_Humidity_2m_15h	Relative humidity at 15LT	%	-	http://aims.fao.org/aos/agrovoc/c_6496	N
Relative_Humidity_2m_18h	Relative humidity at 18LT	%	-	http://aims.fao.org/aos/agrovoc/c_6496	N
Snow_Thickness_LWE_Mean	Snow liquid water equivalent (00-00LT)	cm of liquid water equivalent	Mean	http://aims.fao.org/aos/agrovoc/c_7124	N
Snow_Thickness_Mean	Snow depth (00-00LT)	cm snow	Mean	http://aims.fao.org/aos/agrovoc/c_7124	N
Solar_Radiation_Flux	Surface solar radiation downwards (00-00LT)	J m-2d-1	Sum	http://aims.fao.org/aos/agrovoc/c_14415	N
Temperature_Air_2m_Max_24h	Maximum air temperature at 2 meter (00-00LT) from ERA5 variable 'mx2t' (v1.0, v1.1) or variable 't2m' (v2.0)	K	Maximum	http://aims.fao.org/aos/agrovoc/c_34204	N
Temperature_Air_2m_Max_Day_Time	Maximum air temperature at 2 meter (06-18LT) from ERA5 variable 'mx2t' (v1.0, v1.1) or variable 't2m' (v2.0)	K	Maximum	http://aims.fao.org/aos/agrovoc/c_34204	N
Temperature_Air_2m_Mean_24h	2 meter air temperature (00-00LT) from ERA5 variable 't2m' (all version)	K	Mean	http://aims.fao.org/aos/agrovoc/c_34204	N
Temperature_Air_2m_Mean_Day_Time	2 meter air temperature (06-18LT) from ERA5 variable 't2m' (all version)	K	Mean	http://aims.fao.org/aos/agrovoc/c_34204	N
Temperature_Air_2m_Mean_Night_Time	2 meter air temperature (18-06LT) from ERA5 variable 't2m' (all version)	K	Mean	http://aims.fao.org/aos/agrovoc/c_34204	N
Temperature_Air_2m_Min_24h	Minimum air temperature at 2 meter (00-00LT) from ERA5 variable 'mn2t' (v1.0, v1.1) or variable 't2m' (v2.0)	K	Minimum	http://aims.fao.org/aos/agrovoc/c_34204	N
Temperature_Air_2m_Min_Night_Time	Minimum air temperature at 2 meter (18-06LT) from ERA5 variable 'mn2t' (v1.0, v1.1) or variable 't2m' (v2.0)	K	Minimum	http://aims.fao.org/aos/agrovoc/c_34204	N
Vapour_Pressure_Mean	Vapour pressure (00-00LT)	hPa	Mean	http://aims.fao.org/aos/agrovoc/c_26832	N
Wind_Speed_10m_Mean	10 meter wind component (00-00LT)	m s-1	Mean	http://aims.fao.org/aos/agrovoc/c_29582	N
ReferenceET_PenmanMonteith_FAO56	Penman-Monteith reference evapotranspiration according to the FAO56 approach	mm d-1	-		Y
Derived_Relative_Humidity_2m_Min^*	Minimum Relative Humidity: derived from AgERA5 24h-maximum temperature and mean vapor pressure (00-00LT) as a post-processing step.	%	-		Y
Derived_Relative_Humidity_2m_Max^*	Maximum Relative Humidity: derived from AgERA5 24h-minimum temperature and mean vapor pressure (00-00LT) as a post-processing step.	%	-		Y
Vapour_Pressure_Deficit_at_Maximum_Temperature	Vapour pressure deficit derived from saturated vapour pressure for maximum temperature and 24h mean vapour pressure	hPa	-		Y
Precipitation_Duration_Fraction	Precipitation duration fraction composed of: hours with precipitation amount >= 0.1 mm / h divided by 24. (00-00LT)	-	Count/24		Y

^*The term “derived” was added to indicate the difference with the relative humidity layers at specific time intervals. The latter are derived from the original ERA5 input fields: the temperature (t2m) and dewpoint (d2m) fields while the “derived” variables are computed from the AgERA5 layers as a post-processing step. The advantage of the "derived" relative humidity layers is that they avoid the "striping" effect in the relative humidity layers at specific time intervals. See the relevant section in the "known issues v1.1"

3.4. Input data used

Logically the ERA5 data set is the main input data set. See https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5 ERA5 provides hourly estimates of a large number of atmospheric, land and oceanic climate variables. The data cover the earth on a 30 km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80 km. ERA5 includes information about uncertainties for all variables at reduced spatial and temporal resolutions.
Concerning the archive the years 1979 to present were available during the project. Note that two versions of ERA5 are available through the CDS:

interpolated to a 0.25° grid
original ERA5 model level data (reanalysis-era5-complete) The latter version was used in this project.

ERA5 has a wide list of variables. See the following link: ERA5: data documentation, especially the tables:

2: surface, instantaneous (averages)
3: surface, accumulations
4: surface, minimum/maximum

The following table shows the variables used for the AgERA5 product versions.

Table 3-2: Essential variables used for the AgERA5 v1.0 and v1.1 product

Variable name	Unit	Short	Reference	Group
Snow density	kg m-3	rsn	table 2	INST1
Snow depth	m of water equivalent	sd	table 2	INST1
10 metre U wind component	m s-1	u10	table 2	INST1
10 metre V wind component	m s-1	v10	table 2	INST1
Total cloud cover	(0 - 1)	tcc	table 2	INST1
2 metre temperature	K	t2m	table 2	INST1
2 metre dewpoint temperature	K	d2m	table 2	INST1
Surface solar radiation downwards	J m-2	ssrd	table 3	ACCMNMX
Total precipitation	m	tp	table 3	ACCMNMX
Precipitation type¹	code table (4.201)1	ptype	table 2	INST2
Maximum temperature at 2 metres since previous post-processing (last hour)	K	mx2t	table 5	ACCMNMX
Minimum temperature at 2 metres since previous post-processing (last hour)	K	mn2t	table 5	ACCMNMX

Table 3-3: Essential variables used for the AgERA5 v2.0 product

Variable name	Unit	Short	Reference	Group
Snow density	kg m-3	rsn	table 2	INST1
Snow depth	m of water equivalent	sd	table 2	INST1
10 metre U wind component	m s-1	u10	table 2	INST1
10 metre V wind component	m s-1	v10	table 2	INST1
Total cloud cover	(0 - 1)	tcc	table 2	INST1
2 metre temperature	K	t2m	table 2	INST1
2 metre dewpoint temperature	K	d2m	table 2	INST1
Surface solar radiation downwards	J m-2	ssrd	table 3	ACCMNMX
Total precipitation	m	tp	table 3	ACCMNMX
Precipitation type¹	code table (4.201)	ptype	table 2	INST2

¹ The following types are distinguished: 0 = No precipitation, 1 = Rain, 3 = Freezing rain (i.e. super cooled), 5 = Snow, 6 = Wet snow, 7 = Mixture of rain and snow, 8 = Ice pellets

4. Workflow

The AgERA5 workflow includes (see Figure 4-1):

0) Retrieving original hourly data of ERA5 from the CDS
1) Nearest Neighbor interpolation to 0.1° grid (ECMWF HRES grid)
2) Temporal aggregation and calculation of additional variables
3) Apply location, variable and seasonal specific bias correction plus sea mask
4) Calculation of additional variables (v2.0 only) from AgERA5 layers in step 3)

4.1. Step 0: Retrieving hourly data

The original ERA5² data are stored in the MARS archive and were retrieved, via the CDS (version: reanalysis-era5-complete), and prepared for further processing (see also section 3.3). ERA5 is originally calculated in a T639-spectral space and on a N320-gaussian grid³. This relates best to a 0.28125° grid and therefore this grid definition was used in the download.

4.2. Step 1: NN interpolation to 0.1° grid

Downloaded data were interpolated to a 0.1° grid which is close to the current HRES resolution. To preserve variability and extremes in the original data the Nearest Neighbor (NN) technique was applied.

4.3. Step 2: Temporal aggregation and additional variables

Next, hourly data were aggregated into daily accumulations applying variable and longitude specific aggregation schemes. By applying algorithms, agronomically relevant weather variables were computed that honor local time (LT), e.g. maximum temperature over daytime and minimum temperature over nighttime. Therefore, data comply with local calendar day definitions and aggregation schemes being used by NMIs⁴. Examples of such aggregation schemes, used to aggregate 3-hourly ERA-Interim data, can be found via the following URL: http://marswiki.jrc.ec.europa.eu/agri4castwiki/index.php/Meteorological_data_from_ECMWF_models.

In contrast to the study provided in the above URL, the number of longitudinal aggregation zones were increased from three to eight⁵ zones. Each zone was assigned to a certain longitude range for which a specific aggregation scheme was defined. See Annex I for the zone definition and Annex II for the aggregation schemes.

² ERA5 pertains to ERA5-HRES (stream=oper) and the analyses (type=an)

³ https://confluence.ecmwf.int//display/CKB/ERA5+data+documentation#ERA5datadocumentation-Spatialgrid; https://confluence.ecmwf.int/display/CKB/ERA5%3A+What+is+the+spatial+reference

⁴ For example, JRC asked for a definition that is compatible with the ones used in the stations observations, for possible validation purposes. Furthermore, definitions (for daily averages) should roughly match a local calendar day or (for certain other elements) the corresponding day/night period, in all areas.

⁵ 24 zones was not possible because the HRES operational model data, required for training the bias correction, was not available at 1-hourly time steps

Figure 4-1: Overview of the different processing steps in the whole workflow

An example: the ERA5 archive includes the maximum temperature of the previous hour. The 24 values of maximum temperature can be used to:

Derive the maximum temperature over day time taking the maximum of 12 maximum temperatures values occurring during the local day time (e.g. London between 06 and 18 UTC).
Derive the maximum temperature over 24 hours taking the maximum of 24 maximum temperatures values occurring during the local day (e.g. London between 00 UTC day X and 00 UTC day X + 1).

Similar aggregation can be done for minimum temperature but then taking the minimum over a range of hourly values. Most other elements were aggregated as the mean or sum over 24 hours of the local day. To obtain the set of 24 hours for a certain zone, hourly data of ERA5 is needed of day X, and possibly day X – 1 or X + 1. The exact dataset depends obviously on the zone (longitude range).

In case of precipitation type (rain, snow) the aggregation to a daily time step can be done type specific, thus counting the hours that the type appeared.

The applied aggregation zone definitions work very well with the local time zones of West- and East- Europe and mostly for the North-American continent. For Asia there is a shift of 2-3 hours between the actual local time definition and the definition in our study. The only extreme mismatch of the local time definitions will happen eastward of the dateline in zone E4. Fortunately, the affected areas (Pacific islands and the very western coast of Alaska) are, from an agricultural perspective, not particularly significant.

The following conversions were done:

unit conversion of precipitation (tp): m d^-1 -> mm d^-1
unit conversion of snow (sd; liquid water equivalent): m -> cm

In addition, the following variables were calculated:

10m wind speed (m s-1) from the 10m u (10u) and 10 m v (10v) wind components: sqrt(10u*2 + 10v*2)
snow depth (cm) from snow density (rsn) and snow depth of liquid water equivalent (sd): (sd / rsn) * 1000 * 100
partial water vapour pressure (hPa) from dewpoint temperature (Td; Priestley and Taylor, 1972)): 10 * 0.6108 * exp((17.27 * d2m) / (d2m + 237.3))
relative humidity (%) from 2m temperature (t2m) and dewpoint temperature (d2m): 100 * (exp((17.27 * d2m) / (237.3 + d2m)) / exp((17.27 * t2m) / (237.3 + t2m)))

New in AgERA5 v2.0: In case of precipitation duration fraction, the aggregation counts the number of hourly time steps with precipitation > 0.1 mm. The resulting value is divided by 24h to obtain a fractional value.

The temporal aggregation and calculation of additional variables lead to the final list of variables presented in Table 3-1.

The variables in the dataset answers the need of most common crop models⁶ (working at a daily time step) and their regional implementations.

4.4. Step 3: Bias correction of data at 0.1° grid

A location, variable and season specific bias correction towards the HRES operational model was applied. This way the finer topography, finer land use pattern and finer land-sea delineation of the HRES operational model is more or less included in the downscaled ERA5. In fact, the ERA5 data set is tuned to the detailed topography of the HRES operational model also leading to more consistent time series between ERA5 and the HRES operational model.

For each grid cell and all variables, except precipitation and snow related variables, a linear equation is applied:

\[ Y_{i,j}^{ERA-5,corr} = \alpha_{i,j}Y_{i,j}^{ERA-5} + \beta_{i,j} + [T_{i,j}] \]

in which \( Y_{i,j}^{ERA-5} \) is the ERA5 NN-interpolated variable (e.g. temperature, wind) for grid box [i,j], \( Y_{i,j}^{ERA-5,corr} \) is the ERA5 NN-interpolated and bias corrected variable for grid box [i,j], and α_i,j, β_i,j are correction coefficients (hereinafter referred to as slope and intercept, respectively).

The parameter T_i,j accounts for an additional seasonal correction and reads:

\[ T_{i,j} = \gamma_{1,i,j}T_{1} + \gamma_{2,i,j}T_{2} + \gamma_{3,i,j}T_{3} + \gamma_{4,i,j}T_{4} \]

The correction towards the HRES operational model is very relevant for users that do near real time monitoring of growing conditions and agricultural production. Note that the final ERA5 product will come available with a time lag of one week including the temporary ERA5 line. For monitoring systems like JRC’s Monitoring Agricultural ResourceS (MARS) such time lag is too large and therefore data in such systems have to be completed with data from the HRES operational model. When combining data of two datasets, originating from different resolutions, biases might be introduced that negatively affect the monitoring performance. This can be avoided by correcting the ERA5 towards the HRES operational model. Similar reasoning applies to forecast products like the ENS forecasts (15/30 day ensemble forecasts). This product can also be downscaled and bias corrected towards the HRES operational model. This way more or less consistent time series are obtained linking reanalysis, HRES and ENS data all around a common ‘HRES’ reference. Some remarks:

To improve the timeliness of the foreseen service the preliminary ERA5 product, ERA5t, needs also to be processed. We hereby assume that the bias correction algorithms, which are based on ERA5 data, can also be applied on ERA5t data.
Specifically for users that need to link ERA5 to HRES for NRT monitoring purposes the following issue is relevant. The merge with the HRES operational model would need an
additional service relying on specific data contracts with ECMWF. And the HRES operational model data must be processed in a similar way (daily aggregation, possibly elevation corrections etc.) as the ERA5 data.
Note that the HRES model is constantly improving (improved model physics, increased spatial resolution etc.). Therefore, with each additional HRES model upgrade, the established statistical relationship between ERA5 and HRES will become less valid. Over time, this may lead to jumps in the time series as the bias correction is correcting for aspects that changed in the HRES model. In such case users, that link ERA5 to HRES, need to be warned and eventually the bias correction needs to be updated.

⁶ CGMS-WOFOST, EPIC-BOKU, EPIC-IIASA, EPIC-TAMU, GEPIC, LPJ-GUESS, LPJmL, pAPSIM, pDSSAT, PEGASUS, PEPIC, PRYSBI2

During the processing only the 'land' locations at the surface level (topographical elevation) were maintained using the HRES land-sea mask. This mask includes the area fraction of land within each 0.1° grid cell. As threshold, the fraction 0.05 has been selected: above it is land, below it is sea (see Figure 4-2).

Figure 4-2: Select of land 0.1° grid cells: the area fraction land within a 0.1° grid cell (top) and selection of land grid cells after applying the threshold of 0.05 area fraction

4.5. Step 4 Calculating additional variables

Four new variables were added to AgERA5 v2.0 which are computed from existing AgERA5 layers. Below is a description of how they are computed from the AgERA5 layers derived in step 3.

Penman-Monteith Reference Evapotranspiration

The Penman-Monteith reference evapotranspiration represents the potential evapotranspiration rate from a reference crop canopy (ET0) in mm/d. The routines for these calculations closely follow the procedure by FAO as laid down in the FAO publication Guidelines for computing crop water requirements - FAO Irrigation and drainage paper 56. The routines were derived from PCSE and can be found here, although for AgERA5 a vectorized implementation was created.

The Vapour Pressure Deficit at Daily Maximum Temperature

The VPD at maximum temperature is computed in two steps:

Estimate the saturated vapour pressure Ea [hPa] at maximum daily temperature Tm in [C] using: Ea = 10 * 0.6108 * .exp((17.27 * Tm) / (Tm + 237.3))
Compute VPD from : Ea - Vapour_Pressure_Mean

The derived daily minimum relative humidity

The daily minimum relative humidity rh is computed using the Dewpoint temperature Td [C] and the maximum daily temperature T [C] using:

rh = 100 * (exp((17.27 * T) / (237.3 + T)) / exp((17.27 * T) / (237.3 + T)))

Moreover, checks are carried out to ensure consistency with the relative humidity layers at specific time steps.

The derived daily maximum relative humidity

The daily maximum relative humidity rh is computed using the Dewpoint temperature Td [C] and the minimum daily temperature T [C] using:

rh = 100 * (exp((17.27 * T) / (237.3 + T)) / exp((17.27 * T) / (237.3 + T)))

Moreover, checks are carried out to ensure consistency with the relative humidity layers at specific time steps.

5. Develop bias corrections

Step 3, as described in the previous chapter, covers the bias correction towards the HRES grid (0.1° grid). The grid and variable-specific regression equations are trained on operational ECMWF HRES model data.

The procedure for bias correction is exactly the same for all AgERA5 versions. However, in version 2.0 the daily minimum and maximum temperature variables are derived from a different ERA5 variable (see table 3-1) and therefore the statistical correction equations have been retrained in order to derive new coefficients for those variables.

The approach, to develop the equations, consists of the following main steps:

Interpolate the data towards a 1° grid (see step 1)
Aggregate hourly model data to daily variables (see step 2)
Train statistical correction equations for each variable and grid point
Apply the trained equations to the intermediate AgERA5 variables (output from step 2)

The development of the equations (using HRES operational model as training set) is an on-off action and has been documented in a separate document named “C3S422Lot1.WEnR.DS2_Downscaling and bias correction v1.7.pdf”. This section provides a summary of this work.

The input data:

ECMWF ERA5 reanalysis (grid1: 28125° x 0.28125°)
ECMWF HRES (grid2: 0.10° x 10°)

Both data sets are covering the globe, including land and sea grid boxes.

Originally, ERA5 data is available as hourly fields, while HRES has a temporal resolution of 3 hours. For both models, a set of 12 base parameters (see Table 3-2) was retrieved from the ECMWF MARS archive covering a period of two years. These base parameters with 1-hourly/3-hourly resolution were then aggregated to 22 (derived) daily parameters over 8 different longitudinal bands (see section 4.3; note that schemes given in Annex II only apply to ERA5, the schemes for HRES-data are available on request). Note that the ERA5 data was first interpolated towards the 0.1° grid using the NN-technique (see section 4.2) before applying the aggregation to days.

To train the regression equations, a data set of 2-3 years is desired. Both, ERA5 and HRES, need to be available for this period. Based on the recent HRES model upgrades outlined in the separate report, the period between 2016-04-01 and 2018-03-31 was chosen as the training period for the final bias correction equations. Most importantly, this period does not include any horizontal grid or resolution changes. Also, data of both models were available through ECMWFs MARS archive at the moment the bias correction analysis took place. Therefore, the generated equations correct ERA5 data towards a mixture HRES model cycles (41r2, 43r1 and 43r3).

The equations were derived by means of multiple linear regression.

Not all daily aggregated elements (see Table 3-1) are fitted to be corrected by this method. For instance, the snow parameters lack snow cases for most parts of the world, to build a robust correction statistic. Similar issues are expected to happen with the precipitation parameters (sum and type) in arid regions.

The MOS (Model Output Statistics) routine was used to carry out a multiple linear regression between the ECMWF HRES data and the NN-interpolated ERA5 data for each grid cell. The outcome is a linear equation (in this case demonstrated for the ERA5 data set):

\[ Y_{i,j}^{ERA-5,corr} = \alpha_{i,j}Y_{i,j}^{ERA-5} + \beta_{i,j} + [T_{i,j}] \]

in which \( Y_{i,j}^{ERA-5} \) is the ERA5 NN-interpolated variable (e.g. temperature, wind) for grid box [i,j], \( Y_{i,j}^{ERA-5,corr} \) is the ERA5 NN-interpolated and bias corrected variable for grid box [i,j], and α_i,j, β_i,j are correction coefficients (hereinafter referred to as slope and intercept, respectively).

The parameter T_i,j accounts for an additional seasonal correction and reads:

\[ T_{i,j} = \gamma_{1,i,j}T_{1} + \gamma_{2,i,j}T_{2} + \gamma_{3,i,j}T_{3} + \gamma_{4,i,j}T_{4} \]

in which T₁ to T₄ are sinusoidal time functions with a period of one year, and 𝛾_1,𝑖,𝑗 to 𝛾_4,𝑖,𝑗 are the respective coefficients. The sinusoidal time functions that were used read:

\[ T_{1} = 100\sin \left(2\pi \frac{day-21}{365} \right) \] \[ T_{2} = 100\sin \left(2\pi \frac{day-81}{365} \right) \]

\[ T_{2} = 100\sin \left(2\pi \frac{day-111}{365} \right) \]

\[ T_{2} = 100\sin \left(2\pi \frac{day-141}{365} \right) \]

With the combination of the above sine functions and coefficients, any grid-specific time correction function can be constructed. To achieve this, it is enough to use only the 2 best sinusoidal time functions of the 4 available for each grid point in the final equation.

The objects created by the bias correction application are twofold. The trained regression equation of a particular parameter was written to a NetCDF file, having the slope, the intercept and each of the seasonal cycle coefficients stored as a normal NetCDF parameter. The evaluation metrics were handled similarly. For analysis purposes the MAE, RMSE and R-squared were calculated and stored in a second NetCDF file.

A detailed analysis of the significance of the bias correction can be found in document here.

Table 5-1 summarizes how the ERA5 improves (in terms of MAE for the main elements) after applying the bias correction. The results were derived for AgERA5 v1.0 and v1.1. For AgERA5 v2.0 the ERA5 input variables for the daily min/max temperature statistics changed but we expect the improvement due to bias correction to be very similar.

Table 5-1: MAE (HRES-ERA5corrected) and MAE improvement of different bias corrected variables. The MAE improvements indicate the added value through the bias correction. All metrics were calculated for different regions and for subsets of grid points meeting certain conditions. E.g. “Land & above 800m” only uses grid points being located on land and above 800m. “Coasts & Lakes” subsets all grid points with a land fraction between 10% and 90%.

		Land		Land & below 800m		Land & above 800m		Coasts & Lakes
Variable	Region	MAE	MAE Impr	MAE	MAE Impr	MAE	MAE Impr	MAE	MAE Impr
2t_davg [K]	Africa	0.44	40%	0.42	36%	0.47	48%	0.36	50%
2t_davg	Asia	0.72	36%	0.67	27%	0.86	48%	0.66	32%
2t_davg	Australia	0.43	42%	0.43	35%	0.37	83%	0.30	49%
2t_davg	Europe	0.51	36%	0.47	30%	0.75	55%	0.45	38%
2t_davg	N-America	0.71	31%	0.67	25%	0.85	41%	0.68	28%
2t_davg	S-America	0.45	50%	0.42	41%	0.61	65%	0.38	48%
2d_davg [K]	Africa	0.76	38%	0.77	38%	0.76	39%	0.55	46%
2d_davg	Asia	0.90	29%	0.81	25%	1.09	35%	0.73	28%
2d_davg	Australia	0.57	34%	0.57	28%	0.43	78%	0.36	43%
2d_davg	Europe	0.58	28%	0.55	22%	0.81	46%	0.54	27%
2d_davg	N-America	0.80	23%	0.73	18%	0.97	32%	0.70	21%
2d_davg	S-America	0.54	42%	0.44	37%	0.99	50%	0.41	40%
ff_davg [m/s]	Africa	0.27	25%	0.26	22%	0.28	32%	0.33	47%
ff_davg	Asia	0.29	28%	0.27	24%	0.34	35%	0.36	35%
ff_davg	Australia	0.24	31%	0.25	30%	0.22	41%	0.31	53%
ff_davg	Europe	0.25	31%	0.24	31%	0.32	33%	0.33	48%
ff_davg	N-America	0.29	28%	0.28	26%	0.33	31%	0.33	34%
ff_davg	S-America	0.23	30%	0.22	26%	0.27	42%	0.32	51%
tcc_davg [0-1]	Africa	0.08	3%	0.08	2%	0.08	4%	0.08	5%
tcc_davg	Asia	0.07	0%	0.07	-2%	0.08	4%	0.08	-2%
tcc_davg	Australia	0.06	-1%	0.06	-1%	0.06	5%	0.07	2%
tcc_davg	Europe	0.07	-1%	0.07	-1%	0.07	2%	0.07	-1%
tcc_davg	N-America	0.08	0%	0.08	-1%	0.07	2%	0.08	-1%
tcc_davg	S-America	0.07	4%	0.07	3%	0.07	8%	0.07	5%
ssrd_dsumdiff [J/m2d]	Africa	1055575	7%	1030480	7%	1118699	8%	1151300	13%
ssrd_dsumdiff	Asia	872717	4%	836249	3%	958997	7%	899084	5%
ssrd_dsumdiff	Australia	1205911	6%	1177253	6%	1772895	14%	1497494	12%
ssrd_dsumdiff	Europe	832226	2%	815116	2%	951428	5%	782759	4%
ssrd_dsumdiff	N-America	899054	4%	902781	3%	888809	6%	916596	4%
ssrd_dsumdiff	S-America	1427243	9%	1448626	9%	1328043	13%	1316248	11%

The MAE indicates the error of the corrected data (HRES-ERA5corrected), while the MAE improvement compares the error of the corrected versus the not corrected ERA5 data. All metrics were aggregated for different regions and certain subsets of grid points. Overall, the temperature, humidity and wind speed variables benefit most from the correction. The MAE is reduced by 30% to 60% in the majority of cases. Grid points being located in mountainous areas or along coasts and lakes are improved most. This is not surprising as these are the areas where the largest systematic differences between ERA5 and HRES can be expected. But not only the relative improvements are quite large, also the absolute MAE values after the correction are small. The MAE for the 24h mean of the 2m temperatures (2t_davg) for example is for all continents below 0.72K, and for 4 of 6 continents even below 0.51K.
For the solar radiation flux (ssrd_dsumdiff) the MAE improvement is solid and ranges between 2% and 14%, depending on the region and subset. The results of element "24h mean cloud cover" (tcc_davg) are mixed. For most grid points the correction doesn't add any value. The MAE improvement of the majority of all grid points (land and below 800m) is between -2% and +4%, and therefore near zero. Only for grid points above 800m we can observe a small but clear improvement (2% - 8%).
The following conclusions were drawn from the evaluation study:

The selected bias correction method has its largest benefits in mountainous areas, at coast lines and at lakes.
Seasonal correction on top of the simple bias correction further improves the accuracy of the derived correction equations.
The approach works remarkable well for 3 out of the 4 groups of variables. The averaged relative reduction of MAE is between 30% and 60%. These are:
1. Temperature parameters
2. Humidity parameters
3. Wind speed
The correction models for solar radiation flux reach a MAE improvement of 2% to 14%.
For cloud cover the correction has only a minor effect for most of the grid points. However, mountainous regions still benefit from the correction with a MAE improvement of 2%-8%.

6. Appendix I Longitudinal aggregation zones

Longitudinal aggregation zones are defined around central longitudes. The first zone is at zero longitude (London). This zones stretches from 22.5 west to 22.5 east. The next zone is centered around 45 east stretching from 22.5 east to 67.5 east. And so on. This definition works very well with the local time zone configuration of West- and East-Europe and mostly with the American continent. For Asia there will be a shift between the real local time definition and our definition of 2-3 hours. The only extreme mismatch of the local time definitions will happen eastward of the dateline in zone E4. Fortunately, the affected areas (island in the Pacific and the very western coast of Alaska) are, from agricultural perspective, not so interesting.

7. Appendix II Aggregation schemes

The figures below illustrate the ERA5 UTC slices that are used to derive the AgERA5 variables for different longitudinal aggregation zones. For example the first aggregation scheme below is applied to the variables that consist of daily average values. For computing the daily average temperature for zone C (central) the ERA5 hourly slices from UTC 03:00 to UTC 03:00 (+1: next day) are selected. While for zone E4 the aggregation selects the ERA5 hourly slices from UTC 15:00 (-1: previous day) to UTC 15:00 of the current day. For different aggregation types (night time, day time) different temporal UTC slices are selected as demonstrated in the corresponding figures.

Some remarks:

A „hour box" in the top row is always meant to represent the hour on the left border of the box.
Variables t2m, d2m, ff, tcc, sd, rsn, vp and ptype and rh are all instantaneous values. To align with HRES (only available with 3-hour timestep) the period 03-00 has been selected: aggregate 8 values like 03,06,09,12,15,18,21,00.
Variables mn2t, mx2t, ssrd, tp summarize the condition of 1 hour (sum, min, max, type).

8. References

Toreti, A. Maiorano, G. De Sanctis, H. Webber, A.C. Ruane, D. Fumagalli, A. Ceglar, S. Niemeyer, Zampieri Using reanalysis in crop monitoring and forecasting systems Agricultural Systems, Volume 168, 2019, pp. 144-15.

Glotter et al., 2016, M.J. Glotter, A.C. Ruane, E.J. Moyer, J.W. Elliott Evaluating the sensitivity of agricultural model performance to different climate inputs Appl. Meteorol. Climatol., 55 (2016), pp. 579-594.

Wit, A.J.W. de, Baruth, B., Boogaard, H., Diepen, K. van, Kraalingen, D.W.G. van, Micale, F., Roller, J.A. te, Supit, I., Wijngaart, R. van der, 2010. Using ERA-INTERIM for regional crop yield forecasting in Europe. Climate Research 44 (2010)1. - ISSN 0936-577X - p. 41 - 53.

https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5

https://software.ecmwf.int/wiki/display/CKB/ERA5+data+documentation

http://marswiki.jrc.ec.europa.eu/agri4castwiki/index.php/Meteorological_data_from_ECMWF_mo dels.

https://confluence.ecmwf.int/display/CKB/ERA5%3A+What+is+the+spatial+reference

_{This document has been produced in the context of the Copernicus Climate Change Service (C3S).}

_{The activities leading to these results have been contracted by the European Centre for Medium-Range Weather Forecasts, operator of C3S on behalf of the European Union (Delegation Agreement signed on 11/11/2014 and Contribution Agreement signed on 22/07/2021). All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose.}

_{The users thereof use the information at their sole risk and liability. For the avoidance of all doubt , the European Commission and the European Centre for Medium - Range Weather Forecasts have no liability in respect of this document, which is merely representing the author's view.}

Space shortcuts

Page tree

History of Modifications

Acronyms

1. Scope of the document

2. Executive summary