Contributors: Hendrik Boogaard (WAGENINGEN ENVIRONMENTAL RESEARCH), Allard de Wit (WAGENINGEN ENVIRONMENTAL RESEARCH), Jenny Lazebnik (WAGENINGEN ENVIRONMENTAL RESEARCH), Jonathan Schubert (METEOGROUP), Gerald van der Grijn (METEOGROUP)
History of Modifications
Acronyms
1. Scope of the document
This document provides an overview of the AgERA5 product, the underlying data sets, the underlying algorithms and workflow. The AgERA5 dataset provides daily, agronomic relevant, meteorological data for the period 1979 to present at a spatial resolution of 0.1° .
2. Executive summary
The AgERA5 dataset provides daily surface meteorological data for the period 1979 to present at spatial resolution of 0.1° grid. The service is based on the fifth generation of ECMWF atmospheric re-analyses of the global climate, better known as ERA5. AgERA5 'connects' users in the agricultural domain to the new ERA5 data set. It includes daily aggregates of agronomic relevant variables, tuned to local day definitions and adapted to the finer topography, finer land use pattern and finer land-sea delineation of the ECMWF HRES operational model. The variables cover temperature, precipitation, snow depth, humidity, cloud cover and radiation.
3. Product description
The following text applies to AgERA5 versions 1.0 and v1.1. Extensions were added for v2.0 which have been marked as such.
3.1. Introduction
Climate forcing data is used in analysis and agro-environmental modelling to study aspects of productivity and externalities of agriculture (e.g. Toreti et al, 2019; Glotter et al., 2016; De Wit et al., 2010). In this service we start from the hourly ECMWF ERA5 model data and convert the data into meaningful input for these analyses and modelling. It involves a large amount of data that needs to be processed. Acquisition and pre-processing of ERA5 data, both archive and near real- time (NRT) data, is a large and specialized job. It requires a heavy investment for users like technical policymakers, information agencies, NGOs, commodity traders, agri-businesses, insurance providers etcetera. The complex task and required effort may even be a barrier to start using the data.
This service is based on the original hourly deterministic ECMWF ERA5 data, at surface level and available at a spatial resolution of 30 km (~0.28125°). Data were aggregated to daily time steps and corrected towards a finer topography at a 0.1° spatial resolution. Aggregated data at daily time steps follow a local time zone definition and include a number of major agronomic parameters. The correction to the 0.1° grid was realized by applying grid and variable-specific regression equations to an ERA5 data set interpolated at 0.1 ° grid. The equations were trained on operational ECMWF HRES model data at a 0.1° resolution. The final data set is referred to as AgERA5. AgERA5 users will save potential users money and stimulate businesses in using such high quality data set. It avoids a possible proliferation of different data sets, originating from the basic hourly ERA5 data set.
3.2. Differences between AgERA5 version 1.0/1.1 and version 2.0
The changes with the update from AgERA5 version 1.1 to version 2.0 can be summarized as follows;
- Bug fixes were introduced that solve some of the issues identified in the previous versions. See the "known issues in v1.1" section in the product user guide. These fixes include:
- Fixing of rogue values in the temperature layers
- Fixing inconsistent temperature values (Tmean > Tmax). This problem was largely solved by retrieving all daily temperature statistics from the same ERA5 variable 't2m' (see Table 3-1.)
- Striping in the relative humidity layers was partially resolved by adding new relative humidity layers providing the daily minimum and maximum relative humidity.
- Five new variables were introduced that were requested by users:
- The Penman-Monteith reference evapotranspiration;
- the vapour pressure deficit at daily maximum temperature;
- the daily minimum relative humidity;
- the daily maximum relative humidity;
- The precipitation duration fraction.
- Minor updates related to metadata such as variable names and coordinate reference system.
3.3. Variable definitions
The AgERA5 includes 22 (v1.0 and v1.1) or 27 (v2.0) agronomically relevant variables. See Table 3-1.
The meaning of the columns in table 3.1 is:
- Short name: Name of the variable as encoded in the filename of the variable
- Long name: an explanation of the variable
- Unit: the unit in which this variable is provided
- Aggregation: the type of day aggregation function that is applied to obtain this variable
- The URI of this variable in the AgroVOC Thesaurus: https://aims.fao.org/agrovoc
- in V2: indicates if this variable is include in AgERA5 v2 (Y) or not (N).
Table 3-1:List of variables in the AgERA5 data set
Short name | Long name | Unit | Aggregation | AGROVOC URI | in V2 |
Cloud_Cover_Mean | Total cloud cover (00-00LT) | (0 - 1) | Mean | N | |
Dew_Point_Temperature_2m_Mean | 2 meter dewpoint temperature (00-00LT) | K | Mean | N | |
Precipitation_Flux | Total precipitation (00-00LT) | mm d-1 | Sum | N | |
Precipitation_Rain_Duration_Fraction | Precipitation type duration - rain (00-00LT) | - | Count/24 | N | |
Precipitation_Solid_Duration_Fraction | Precipitation type duration - solid fraction (no hail) composed of: precipitation types freezing rain (3), snow (5), wet snow (6), mixture of rain and snow (7) and ice pellets (8) (00-00LT) | - | Count/24 | N | |
Relative_Humidity_2m_06h | Relative humidity at 06LT | % | - | N | |
Relative_Humidity_2m_09h | Relative humidity at 09LT | % | - | N | |
Relative_Humidity_2m_12h | Relative humidity at 12LT | % | - | N | |
Relative_Humidity_2m_15h | Relative humidity at 15LT | % | - | N | |
Relative_Humidity_2m_18h | Relative humidity at 18LT | % | - | N | |
Snow_Thickness_LWE_Mean | Snow liquid water equivalent (00-00LT) | cm of liquid water equivalent | Mean | N | |
Snow_Thickness_Mean | Snow depth (00-00LT) | cm snow | Mean | N | |
Solar_Radiation_Flux | Surface solar radiation downwards (00-00LT) | J m-2d-1 | Sum | N | |
Temperature_Air_2m_Max_24h | Maximum air temperature at 2 meter (00-00LT) from ERA5 variable 'mx2t' (v1.0, v1.1) or variable 't2m' (v2.0) | K | Maximum | N | |
Temperature_Air_2m_Max_Day_Time | Maximum air temperature at 2 meter (06-18LT) from ERA5 variable 'mx2t' (v1.0, v1.1) or variable 't2m' (v2.0) | K | Maximum | N | |
Temperature_Air_2m_Mean_24h | 2 meter air temperature (00-00LT) from ERA5 variable 't2m' (all version) | K | Mean | N | |
Temperature_Air_2m_Mean_Day_Time |
| K | Mean | N | |
Temperature_Air_2m_Mean_Night_Time | 2 meter air temperature (18-06LT) from ERA5 variable 't2m' (all version) | K | Mean | N | |
Temperature_Air_2m_Min_24h | Minimum air temperature at 2 meter (00-00LT) from ERA5 variable 'mn2t' (v1.0, v1.1) or variable 't2m' (v2.0) | K | Minimum | N | |
Temperature_Air_2m_Min_Night_Time | Minimum air temperature at 2 meter (18-06LT) from ERA5 variable 'mn2t' (v1.0, v1.1) or variable 't2m' (v2.0) | K | Minimum | N | |
Vapour_Pressure_Mean | Vapour pressure (00-00LT) | hPa | Mean | N | |
Wind_Speed_10m_Mean | 10 meter wind component (00-00LT) | m s-1 | Mean | N | |
ReferenceET_PenmanMonteith_FAO56 | Penman-Monteith reference evapotranspiration according to the FAO56 approach | mm d-1 | - | Y | |
Derived_Relative_Humidity_2m_Min* | Minimum Relative Humidity: derived from AgERA5 24h-maximum temperature and mean vapor pressure (00-00LT) as a post-processing step. | % | - | Y | |
Derived_Relative_Humidity_2m_Max* | Maximum Relative Humidity: derived from AgERA5 24h-minimum temperature and mean vapor pressure (00-00LT) as a post-processing step. | % | - | Y | |
Vapour_Pressure_Deficit_at_Maximum_Temperature | Vapour pressure deficit derived from saturated vapour pressure for maximum temperature and 24h mean vapour pressure | hPa | - | Y | |
Precipitation_Duration_Fraction | Precipitation duration fraction composed of: hours with precipitation amount >= 0.1 mm / h divided by 24. (00-00LT) | - | Count/24 | Y |
*The term “derived” was added to indicate the difference with the relative humidity layers at specific time intervals. The latter are derived from the original ERA5 input fields: the temperature (t2m) and dewpoint (d2m) fields while the “derived” variables are computed from the AgERA5 layers as a post-processing step. The advantage of the "derived" relative humidity layers is that they avoid the "striping" effect in the relative humidity layers at specific time intervals. See the relevant section in the "known issues v1.1"
3.4. Input data used
Logically the ERA5 data set is the main input data set. See https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5 ERA5 provides hourly estimates of a large number of atmospheric, land and oceanic climate variables. The data cover the earth on a 30 km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80 km. ERA5 includes information about uncertainties for all variables at reduced spatial and temporal resolutions.
Concerning the archive the years 1979 to present were available during the project. Note that two versions of ERA5 are available through the CDS:
- interpolated to a 0.25° grid
- original ERA5 model level data (reanalysis-era5-complete) The latter version was used in this project.
ERA5 has a wide list of variables. See the following link: ERA5: data documentation, especially the tables:
- 2: surface, instantaneous (averages)
- 3: surface, accumulations
- 4: surface, minimum/maximum
The following table shows the variables used for the AgERA5 product versions.
Table 3-2: Essential variables used for the AgERA5 v1.0 and v1.1 product
Variable name | Unit | Short | Reference | Group |
Snow density | kg m-3 | rsn | table 2 | INST1 |
Snow depth | m of water | sd | table 2 | INST1 |
10 metre U wind component | m s-1 | u10 | table 2 | INST1 |
10 metre V wind component | m s-1 | v10 | table 2 | INST1 |
Total cloud cover | (0 - 1) | tcc | table 2 | INST1 |
2 metre temperature | K | t2m | table 2 | INST1 |
2 metre dewpoint temperature | K | d2m | table 2 | INST1 |
Surface solar radiation downwards | J m-2 | ssrd | table 3 | ACCMNMX |
Total precipitation | m | tp | table 3 | ACCMNMX |
Precipitation type1 | code table | ptype | table 2 | INST2 |
Maximum temperature at 2 metres since | K | mx2t | table 5 | ACCMNMX |
Minimum temperature at 2 metres since | K | mn2t | table 5 | ACCMNMX |
Table 3-3: Essential variables used for the AgERA5 v2.0 product
Variable name | Unit | Short | Reference | Group |
Snow density | kg m-3 | rsn | table 2 | INST1 |
Snow depth | m of water | sd | table 2 | INST1 |
10 metre U wind component | m s-1 | u10 | table 2 | INST1 |
10 metre V wind component | m s-1 | v10 | table 2 | INST1 |
Total cloud cover | (0 - 1) | tcc | table 2 | INST1 |
2 metre temperature | K | t2m | table 2 | INST1 |
2 metre dewpoint temperature | K | d2m | table 2 | INST1 |
Surface solar radiation downwards | J m-2 | ssrd | table 3 | ACCMNMX |
Total precipitation | m | tp | table 3 | ACCMNMX |
Precipitation type1 | code table | ptype | table 2 | INST2 |
4. Workflow
The AgERA5 workflow includes (see Figure 4-1):
- 0) Retrieving original hourly data of ERA5 from the CDS
- 1) Nearest Neighbor interpolation to 0.1° grid (ECMWF HRES grid)
- 2) Temporal aggregation and calculation of additional variables
- 3) Apply location, variable and seasonal specific bias correction plus sea mask
- 4) Calculation of additional variables (v2.0 only) from AgERA5 layers in step 3)
4.1. Step 0: Retrieving hourly data
The original ERA52 data are stored in the MARS archive and were retrieved, via the CDS (version: reanalysis-era5-complete), and prepared for further processing (see also section 3.3). ERA5 is originally calculated in a T639-spectral space and on a N320-gaussian grid3. This relates best to a 0.28125° grid and therefore this grid definition was used in the download.
4.2. Step 1: NN interpolation to 0.1° grid
Downloaded data were interpolated to a 0.1° grid which is close to the current HRES resolution. To preserve variability and extremes in the original data the Nearest Neighbor (NN) technique was applied.
4.3. Step 2: Temporal aggregation and additional variables
Next, hourly data were aggregated into daily accumulations applying variable and longitude specific aggregation schemes. By applying algorithms, agronomically relevant weather variables were computed that honor local time (LT), e.g. maximum temperature over daytime and minimum temperature over nighttime. Therefore, data comply with local calendar day definitions and aggregation schemes being used by NMIs4. Examples of such aggregation schemes, used to aggregate 3-hourly ERA-Interim data, can be found via the following URL: http://marswiki.jrc.ec.europa.eu/agri4castwiki/index.php/Meteorological_data_from_ECMWF_models.
In contrast to the study provided in the above URL, the number of longitudinal aggregation zones were increased from three to eight5 zones. Each zone was assigned to a certain longitude range for which a specific aggregation scheme was defined. See Annex I for the zone definition and Annex II for the aggregation schemes.
Figure 4-1: Overview of the different processing steps in the whole workflow
An example: the ERA5 archive includes the maximum temperature of the previous hour. The 24 values of maximum temperature can be used to:
- Derive the maximum temperature over day time taking the maximum of 12 maximum temperatures values occurring during the local day time (e.g. London between 06 and 18 UTC).
- Derive the maximum temperature over 24 hours taking the maximum of 24 maximum temperatures values occurring during the local day (e.g. London between 00 UTC day X and 00 UTC day X + 1).
Similar aggregation can be done for minimum temperature but then taking the minimum over a range of hourly values. Most other elements were aggregated as the mean or sum over 24 hours of the local day. To obtain the set of 24 hours for a certain zone, hourly data of ERA5 is needed of day X, and possibly day X – 1 or X + 1. The exact dataset depends obviously on the zone (longitude range).
In case of precipitation type (rain, snow) the aggregation to a daily time step can be done type specific, thus counting the hours that the type appeared.
The applied aggregation zone definitions work very well with the local time zones of West- and East- Europe and mostly for the North-American continent. For Asia there is a shift of 2-3 hours between the actual local time definition and the definition in our study. The only extreme mismatch of the local time definitions will happen eastward of the dateline in zone E4. Fortunately, the affected areas (Pacific islands and the very western coast of Alaska) are, from an agricultural perspective, not particularly significant.
The following conversions were done:
- unit conversion of precipitation (tp): m d-1 -> mm d-1
- unit conversion of snow (sd; liquid water equivalent): m -> cm
In addition, the following variables were calculated:
- 10m wind speed (m s-1) from the 10m u (10u) and 10 m v (10v) wind components: sqrt(10u*2 + 10v*2)
- snow depth (cm) from snow density (rsn) and snow depth of liquid water equivalent (sd): (sd / rsn) * 1000 * 100
- partial water vapour pressure (hPa) from dewpoint temperature (Td; Priestley and Taylor, 1972)): 10 * 0.6108 * exp((17.27 * d2m) / (d2m + 237.3))
- relative humidity (%) from 2m temperature (t2m) and dewpoint temperature (d2m): 100 * (exp((17.27 * d2m) / (237.3 + d2m)) / exp((17.27 * t2m) / (237.3 + t2m)))
New in AgERA5 v2.0: In case of precipitation duration fraction, the aggregation counts the number of hourly time steps with precipitation > 0.1 mm. The resulting value is divided by 24h to obtain a fractional value.
The temporal aggregation and calculation of additional variables lead to the final list of variables presented in Table 3-1.
The variables in the dataset answers the need of most common crop models6 (working at a daily time step) and their regional implementations.
4.4. Step 3: Bias correction of data at 0.1° grid
A location, variable and season specific bias correction towards the HRES operational model was applied. This way the finer topography, finer land use pattern and finer land-sea delineation of the HRES operational model is more or less included in the downscaled ERA5. In fact, the ERA5 data set is tuned to the detailed topography of the HRES operational model also leading to more consistent time series between ERA5 and the HRES operational model.
For each grid cell and all variables, except precipitation and snow related variables, a linear equation is applied:
\[ Y_{i,j}^{ERA-5,corr} = \alpha_{i,j}Y_{i,j}^{ERA-5} + \beta_{i,j} + [T_{i,j}] \]in which \( Y_{i,j}^{ERA-5} \) is the ERA5 NN-interpolated variable (e.g. temperature, wind) for grid box [i,j], \( Y_{i,j}^{ERA-5,corr} \) is the ERA5 NN-interpolated and bias corrected variable for grid box [i,j], and αi,j, βi,j are correction coefficients (hereinafter referred to as slope and intercept, respectively).
The parameter Ti,j accounts for an additional seasonal correction and reads:
\[ T_{i,j} = \gamma_{1,i,j}T_{1} + \gamma_{2,i,j}T_{2} + \gamma_{3,i,j}T_{3} + \gamma_{4,i,j}T_{4} \]The correction towards the HRES operational model is very relevant for users that do near real time monitoring of growing conditions and agricultural production. Note that the final ERA5 product will come available with a time lag of one week including the temporary ERA5 line. For monitoring systems like JRC’s Monitoring Agricultural ResourceS (MARS) such time lag is too large and therefore data in such systems have to be completed with data from the HRES operational model. When combining data of two datasets, originating from different resolutions, biases might be introduced that negatively affect the monitoring performance. This can be avoided by correcting the ERA5 towards the HRES operational model. Similar reasoning applies to forecast products like the ENS forecasts (15/30 day ensemble forecasts). This product can also be downscaled and bias corrected towards the HRES operational model. This way more or less consistent time series are obtained linking reanalysis, HRES and ENS data all around a common ‘HRES’ reference. Some remarks:
- To improve the timeliness of the foreseen service the preliminary ERA5 product, ERA5t, needs also to be processed. We hereby assume that the bias correction algorithms, which are based on ERA5 data, can also be applied on ERA5t data.
- Specifically for users that need to link ERA5 to HRES for NRT monitoring purposes the following issue is relevant. The merge with the HRES operational model would need an
- additional service relying on specific data contracts with ECMWF. And the HRES operational model data must be processed in a similar way (daily aggregation, possibly elevation corrections etc.) as the ERA5 data.
- Note that the HRES model is constantly improving (improved model physics, increased spatial resolution etc.). Therefore, with each additional HRES model upgrade, the established statistical relationship between ERA5 and HRES will become less valid. Over time, this may lead to jumps in the time series as the bias correction is correcting for aspects that changed in the HRES model. In such case users, that link ERA5 to HRES, need to be warned and eventually the bias correction needs to be updated.
During the processing only the 'land' locations at the surface level (topographical elevation) were maintained using the HRES land-sea mask. This mask includes the area fraction of land within each 0.1° grid cell. As threshold, the fraction 0.05 has been selected: above it is land, below it is sea (see Figure 4-2).
Figure 4-2: Select of land 0.1° grid cells: the area fraction land within a 0.1° grid cell (top) and selection of land grid cells after applying the threshold of 0.05 area fraction
4.5. Step 4 Calculating additional variables
Four new variables were added to AgERA5 v2.0 which are computed from existing AgERA5 layers. Below is a description of how they are computed from the AgERA5 layers derived in step 3.
Penman-Monteith Reference Evapotranspiration
The Penman-Monteith reference evapotranspiration represents the potential evapotranspiration rate from a reference crop canopy (ET0) in mm/d. The routines for these calculations closely follow the procedure by FAO as laid down in the FAO publication Guidelines for computing crop water requirements - FAO Irrigation and drainage paper 56. The routines were derived from PCSE and can be found here, although for AgERA5 a vectorized implementation was created.
The Vapour Pressure Deficit at Daily Maximum Temperature
The VPD at maximum temperature is computed in two steps:
- Estimate the saturated vapour pressure Ea [hPa] at maximum daily temperature Tm in [C] using:
Ea = 10 * 0.6108 * .exp((17.27 * Tm) / (Tm + 237.3))
- Compute VPD from : Ea - Vapour_Pressure_Mean
The derived daily minimum relative humidity
The daily minimum relative humidity rh is computed using the Dewpoint temperature Td [C] and the maximum daily temperature T [C] using:
rh = 100 * (exp((17.27 * T) / (237.3 + T)) / exp((17.27 * T) / (237.3 + T)))
Moreover, checks are carried out to ensure consistency with the relative humidity layers at specific time steps.
The derived daily maximum relative humidity
The daily maximum relative humidity rh is computed using the Dewpoint temperature Td [C] and the minimum daily temperature T [C] using:
rh = 100 * (exp((17.27 * T) / (237.3 + T)) / exp((17.27 * T) / (237.3 + T)))
Moreover, checks are carried out to ensure consistency with the relative humidity layers at specific time steps.
5. Develop bias corrections
Step 3, as described in the previous chapter, covers the bias correction towards the HRES grid (0.1° grid). The grid and variable-specific regression equations are trained on operational ECMWF HRES model data.
The procedure for bias correction is exactly the same for all AgERA5 versions. However, in version 2.0 the daily minimum and maximum temperature variables are derived from a different ERA5 variable (see table 3-1) and therefore the statistical correction equations have been retrained in order to derive new coefficients for those variables.
The approach, to develop the equations, consists of the following main steps:
- Interpolate the data towards a 1° grid (see step 1)
- Aggregate hourly model data to daily variables (see step 2)
- Train statistical correction equations for each variable and grid point
- Apply the trained equations to the intermediate AgERA5 variables (output from step 2)
The development of the equations (using HRES operational model as training set) is an on-off action and has been documented in a separate document named “C3S422Lot1.WEnR.DS2_Downscaling and bias correction v1.7.pdf”. This section provides a summary of this work.
The input data:
- ECMWF ERA5 reanalysis (grid1: 28125° x 0.28125°)
- ECMWF HRES (grid2: 0.10° x 10°)
Both data sets are covering the globe, including land and sea grid boxes.
Originally, ERA5 data is available as hourly fields, while HRES has a temporal resolution of 3 hours. For both models, a set of 12 base parameters (see Table 3-2) was retrieved from the ECMWF MARS archive covering a period of two years. These base parameters with 1-hourly/3-hourly resolution were then aggregated to 22 (derived) daily parameters over 8 different longitudinal bands (see section 4.3; note that schemes given in Annex II only apply to ERA5, the schemes for HRES-data are available on request). Note that the ERA5 data was first interpolated towards the 0.1° grid using the NN-technique (see section 4.2) before applying the aggregation to days.
To train the regression equations, a data set of 2-3 years is desired. Both, ERA5 and HRES, need to be available for this period. Based on the recent HRES model upgrades outlined in the separate report, the period between 2016-04-01 and 2018-03-31 was chosen as the training period for the final bias correction equations. Most importantly, this period does not include any horizontal grid or resolution changes. Also, data of both models were available through ECMWFs MARS archive at the moment the bias correction analysis took place. Therefore, the generated equations correct ERA5 data towards a mixture HRES model cycles (41r2, 43r1 and 43r3).
The equations were derived by means of multiple linear regression.
Not all daily aggregated elements (see Table 3-1) are fitted to be corrected by this method. For instance, the snow parameters lack snow cases for most parts of the world, to build a robust correction statistic. Similar issues are expected to happen with the precipitation parameters (sum and type) in arid regions.
The MOS (Model Output Statistics) routine was used to carry out a multiple linear regression between the ECMWF HRES data and the NN-interpolated ERA5 data for each grid cell. The outcome is a linear equation (in this case demonstrated for the ERA5 data set):
\[ Y_{i,j}^{ERA-5,corr} = \alpha_{i,j}Y_{i,j}^{ERA-5} + \beta_{i,j} + [T_{i,j}] \]in which \( Y_{i,j}^{ERA-5} \) is the ERA5 NN-interpolated variable (e.g. temperature, wind) for grid box [i,j], \( Y_{i,j}^{ERA-5,corr} \) is the ERA5 NN-interpolated and bias corrected variable for grid box [i,j], and αi,j, βi,j are correction coefficients (hereinafter referred to as slope and intercept, respectively).
The parameter Ti,j accounts for an additional seasonal correction and reads:
\[ T_{i,j} = \gamma_{1,i,j}T_{1} + \gamma_{2,i,j}T_{2} + \gamma_{3,i,j}T_{3} + \gamma_{4,i,j}T_{4} \]in which T1 to T4 are sinusoidal time functions with a period of one year, and 𝛾1,𝑖,𝑗 to 𝛾4,𝑖,𝑗 are the respective coefficients. The sinusoidal time functions that were used read:
\[ T_{1} = 100\sin \left(2\pi \frac{day-21}{365} \right) \] \[ T_{2} = 100\sin \left(2\pi \frac{day-81}{365} \right) \]With the combination of the above sine functions and coefficients, any grid-specific time correction function can be constructed. To achieve this, it is enough to use only the 2 best sinusoidal time functions of the 4 available for each grid point in the final equation.
The objects created by the bias correction application are twofold. The trained regression equation of a particular parameter was written to a NetCDF file, having the slope, the intercept and each of the seasonal cycle coefficients stored as a normal NetCDF parameter. The evaluation metrics were handled similarly. For analysis purposes the MAE, RMSE and R-squared were calculated and stored in a second NetCDF file.
A detailed analysis of the significance of the bias correction can be found in document here.
Table 5-1 summarizes how the ERA5 improves (in terms of MAE for the main elements) after applying the bias correction. The results were derived for AgERA5 v1.0 and v1.1. For AgERA5 v2.0 the ERA5 input variables for the daily min/max temperature statistics changed but we expect the improvement due to bias correction to be very similar.
Table 5-1: MAE (HRES-ERA5corrected) and MAE improvement of different bias corrected variables. The MAE improvements indicate the added value through the bias correction. All metrics were calculated for different regions and for subsets of grid points meeting certain conditions. E.g. “Land & above 800m” only uses grid points being located on land and above 800m. “Coasts & Lakes” subsets all grid points with a land fraction between 10% and 90%.
Land | Land & below 800m | Land & above 800m | Coasts & Lakes | ||||||
Variable | Region | MAE | MAE Impr | MAE | MAE Impr | MAE | MAE Impr | MAE | MAE Impr |
2t_davg [K] | Africa | 0.44 | 40% | 0.42 | 36% | 0.47 | 48% | 0.36 | 50% |
2t_davg | Asia | 0.72 | 36% | 0.67 | 27% | 0.86 | 48% | 0.66 | 32% |
2t_davg | Australia | 0.43 | 42% | 0.43 | 35% | 0.37 | 83% | 0.30 | 49% |
2t_davg | Europe | 0.51 | 36% | 0.47 | 30% | 0.75 | 55% | 0.45 | 38% |
2t_davg | N-America | 0.71 | 31% | 0.67 | 25% | 0.85 | 41% | 0.68 | 28% |
2t_davg | S-America | 0.45 | 50% | 0.42 | 41% | 0.61 | 65% | 0.38 | 48% |
2d_davg [K] | Africa | 0.76 | 38% | 0.77 | 38% | 0.76 | 39% | 0.55 | 46% |
2d_davg | Asia | 0.90 | 29% | 0.81 | 25% | 1.09 | 35% | 0.73 | 28% |
2d_davg | Australia | 0.57 | 34% | 0.57 | 28% | 0.43 | 78% | 0.36 | 43% |
2d_davg | Europe | 0.58 | 28% | 0.55 | 22% | 0.81 | 46% | 0.54 | 27% |
2d_davg | N-America | 0.80 | 23% | 0.73 | 18% | 0.97 | 32% | 0.70 | 21% |
2d_davg | S-America | 0.54 | 42% | 0.44 | 37% | 0.99 | 50% | 0.41 | 40% |
ff_davg [m/s] | Africa | 0.27 | 25% | 0.26 | 22% | 0.28 | 32% | 0.33 | 47% |
ff_davg | Asia | 0.29 | 28% | 0.27 | 24% | 0.34 | 35% | 0.36 | 35% |
ff_davg | Australia | 0.24 | 31% | 0.25 | 30% | 0.22 | 41% | 0.31 | 53% |
ff_davg | Europe | 0.25 | 31% | 0.24 | 31% | 0.32 | 33% | 0.33 | 48% |
ff_davg | N-America | 0.29 | 28% | 0.28 | 26% | 0.33 | 31% | 0.33 | 34% |
ff_davg | S-America | 0.23 | 30% | 0.22 | 26% | 0.27 | 42% | 0.32 | 51% |
tcc_davg [0-1] | Africa | 0.08 | 3% | 0.08 | 2% | 0.08 | 4% | 0.08 | 5% |
tcc_davg | Asia | 0.07 | 0% | 0.07 | -2% | 0.08 | 4% | 0.08 | -2% |
tcc_davg | Australia | 0.06 | -1% | 0.06 | -1% | 0.06 | 5% | 0.07 | 2% |
tcc_davg | Europe | 0.07 | -1% | 0.07 | -1% | 0.07 | 2% | 0.07 | -1% |
tcc_davg | N-America | 0.08 | 0% | 0.08 | -1% | 0.07 | 2% | 0.08 | -1% |
tcc_davg | S-America | 0.07 | 4% | 0.07 | 3% | 0.07 | 8% | 0.07 | 5% |
ssrd_dsumdiff [J/m2d] | Africa | 1055575 | 7% | 1030480 | 7% | 1118699 | 8% | 1151300 | 13% |
ssrd_dsumdiff | Asia | 872717 | 4% | 836249 | 3% | 958997 | 7% | 899084 | 5% |
ssrd_dsumdiff | Australia | 1205911 | 6% | 1177253 | 6% | 1772895 | 14% | 1497494 | 12% |
ssrd_dsumdiff | Europe | 832226 | 2% | 815116 | 2% | 951428 | 5% | 782759 | 4% |
ssrd_dsumdiff | N-America | 899054 | 4% | 902781 | 3% | 888809 | 6% | 916596 | 4% |
ssrd_dsumdiff | S-America | 1427243 | 9% | 1448626 | 9% | 1328043 | 13% | 1316248 | 11% |
The MAE indicates the error of the corrected data (HRES-ERA5corrected), while the MAE improvement compares the error of the corrected versus the not corrected ERA5 data. All metrics were aggregated for different regions and certain subsets of grid points. Overall, the temperature, humidity and wind speed variables benefit most from the correction. The MAE is reduced by 30% to 60% in the majority of cases. Grid points being located in mountainous areas or along coasts and lakes are improved most. This is not surprising as these are the areas where the largest systematic differences between ERA5 and HRES can be expected. But not only the relative improvements are quite large, also the absolute MAE values after the correction are small. The MAE for the 24h mean of the 2m temperatures (2t_davg) for example is for all continents below 0.72K, and for 4 of 6 continents even below 0.51K.
For the solar radiation flux (ssrd_dsumdiff) the MAE improvement is solid and ranges between 2% and 14%, depending on the region and subset. The results of element "24h mean cloud cover" (tcc_davg) are mixed. For most grid points the correction doesn't add any value. The MAE improvement of the majority of all grid points (land and below 800m) is between -2% and +4%, and therefore near zero. Only for grid points above 800m we can observe a small but clear improvement (2% - 8%).
The following conclusions were drawn from the evaluation study:
- The selected bias correction method has its largest benefits in mountainous areas, at coast lines and at lakes.
- Seasonal correction on top of the simple bias correction further improves the accuracy of the derived correction equations.
- The approach works remarkable well for 3 out of the 4 groups of variables. The averaged relative reduction of MAE is between 30% and 60%. These are:
- Temperature parameters
- Humidity parameters
- Wind speed
- The correction models for solar radiation flux reach a MAE improvement of 2% to 14%.
- For cloud cover the correction has only a minor effect for most of the grid points. However, mountainous regions still benefit from the correction with a MAE improvement of 2%-8%.
6. Appendix I Longitudinal aggregation zones
Longitudinal aggregation zones are defined around central longitudes. The first zone is at zero longitude (London). This zones stretches from 22.5 west to 22.5 east. The next zone is centered around 45 east stretching from 22.5 east to 67.5 east. And so on. This definition works very well with the local time zone configuration of West- and East-Europe and mostly with the American continent. For Asia there will be a shift between the real local time definition and our definition of 2-3 hours. The only extreme mismatch of the local time definitions will happen eastward of the dateline in zone E4. Fortunately, the affected areas (island in the Pacific and the very western coast of Alaska) are, from agricultural perspective, not so interesting.
7. Appendix II Aggregation schemes
The figures below illustrate the ERA5 UTC slices that are used to derive the AgERA5 variables for different longitudinal aggregation zones. For example the first aggregation scheme below is applied to the variables that consist of daily average values. For computing the daily average temperature for zone C (central) the ERA5 hourly slices from UTC 03:00 to UTC 03:00 (+1: next day) are selected. While for zone E4 the aggregation selects the ERA5 hourly slices from UTC 15:00 (-1: previous day) to UTC 15:00 of the current day. For different aggregation types (night time, day time) different temporal UTC slices are selected as demonstrated in the corresponding figures.
Some remarks:
- A „hour box" in the top row is always meant to represent the hour on the left border of the box.
- Variables t2m, d2m, ff, tcc, sd, rsn, vp and ptype and rh are all instantaneous values. To align with HRES (only available with 3-hour timestep) the period 03-00 has been selected: aggregate 8 values like 03,06,09,12,15,18,21,00.
- Variables mn2t, mx2t, ssrd, tp summarize the condition of 1 hour (sum, min, max, type).
8. References
Toreti, A. Maiorano, G. De Sanctis, H. Webber, A.C. Ruane, D. Fumagalli, A. Ceglar, S. Niemeyer, Zampieri Using reanalysis in crop monitoring and forecasting systems Agricultural Systems, Volume 168, 2019, pp. 144-15.
Glotter et al., 2016, M.J. Glotter, A.C. Ruane, E.J. Moyer, J.W. Elliott Evaluating the sensitivity of agricultural model performance to different climate inputs Appl. Meteorol. Climatol., 55 (2016), pp. 579-594.
Wit, A.J.W. de, Baruth, B., Boogaard, H., Diepen, K. van, Kraalingen, D.W.G. van, Micale, F., Roller, J.A. te, Supit, I., Wijngaart, R. van der, 2010. Using ERA-INTERIM for regional crop yield forecasting in Europe. Climate Research 44 (2010)1. - ISSN 0936-577X - p. 41 - 53.
https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5
https://software.ecmwf.int/wiki/display/CKB/ERA5+data+documentation
http://marswiki.jrc.ec.europa.eu/agri4castwiki/index.php/Meteorological_data_from_ECMWF_mo dels.
https://confluence.ecmwf.int/display/CKB/ERA5%3A+What+is+the+spatial+reference