Contributors: Hendrik Boogaard (WAGENINGEN ENVIRONMENTAL RESEARCH), Gerald van der Grijn (METEOGROUP)
History of Modifications
Acronyms
1. Scope of the document
This document summarizes the characteristics of the AgERA5 data set in a concise manner with focus on: space and time extent and resolution; data formats, metadata and flags; description of variables, strengths and limitations, usage do's and dont's.
The AgERA5 dataset provides daily surface meteorological data for the period 1979 to present at spatial resolution of 0.1° grid. The service is based on the fifth generation of ECMWF atmospheric re-analyses of the global climate, better known as ERA5. AgERA5 'connects' users in the agricultural domain to the new ERA5 data set. It includes daily aggregates of agronomic relevant elements, tuned to local day definitions and adapted to the finer topography, finer land use pattern and finer land-sea delineation of the ECMWF HRES operational model. The elements cover temperature, precipitation, snow depth, humidity, cloud cover and radiation.
2. Executive summary
The AgERA5 dataset provides daily surface meteorological data for the period 1979 to present at spatial resolution of 0.1° grid. The service is based on the fifth generation of ECMWF atmospheric re-analyses of the global climate, better known as ERA5.
AgERA5 'connects' users in the agricultural domain to the new ERA5 data set. It includes daily aggregates of agronomic relevant variables, tuned to local day
definitions and adapted to the finer topography, finer land use pattern and finer land- sea delineation of the ECMWF HRES operational model. The variables cover temperature, precipitation, snow depth, humidity, cloud cover and radiation.
3. Product description
The following text applies to AgERA5 version 1.0.
3.1. Introduction
Climate forcing data is used in analysis and agro-environmental modelling to study aspects of productivity and externalities of agriculture (e.g. Toreti et al, 2019; Glotter et al., 2016; De Wit et al., 2010). In this service we start from the hourly ECMWF ERA5 model data and convert the data into meaningful input for these analyses and modelling. It involves a large amount of data that needs to be processed. Acquisition and pre-processing of ERA5 data, both archive and near real-time (NRT) data, is a large and specialized job. It requires a heavy investment for users like technical policymakers, information agencies, NGOs, commodity traders, agri-businesses, insurance providers etcetera. The complex task and required effort may even be a barrier to start using the data.
This service is based on the original hourly deterministic ECMWF ERA5 data, at surface level and available at a spatial resolution of 30 km (~0.28125°). Data were aggregated to daily time steps and corrected towards a finer topography at a 0.1° spatial resolution. Aggregated data at daily time steps follow a local time zone definition and include a number of major agronomic parameters. The correction to the 0.1° grid was realized by applying grid and variable-specific regression equations to an ERA5 data set interpolated at 0.1° grid. The equations were trained on operational ECMWF HRES model data at a 0.1° resolution. The final data set is referred to as AgERA5. AgERA5 users will save potential users money and stimulate businesses in using such high quality data set. It avoids a possible proliferation of different data sets, originating from the basic hourly ERA5 data set.
3.2. Geophysical product description
3.2.1. Generic bioclimatic indicators
The AgERA5 includes 22 agronomic relevant variables. See table 3.1.
Table 3.1: List of variables in the AgERA5 data set
Short name | Long name | Unit | Aggregation | AGROVOC URI |
Cloud_Cover_Mean | Total cloud cover (00-00LT) | (0 - 1) | Mean | |
Dew_Point_Temperature_2m_Mean | 2 meter dewpoint temperature (00-00LT) | K | Mean | |
Preciptation_Flux | Total precipitation (00-00LT) | mm d-1 | Sum | |
Preciptation_Rain_Duration_Fraction | Precipitation type duration - rain (00-00LT) | - | Count | |
Preciptation_Solid_Duration_Fraction | Precipitation type duration - solid fraction (no hail) composed of: precipitation types freezing rain (3), snow (5), wet snow (6), mixture of | - | Count | |
Relative_Humidity_2m_06h | Relative humidity at 06LT | % | - | |
Relative_Humidity_2m_09h | Relative humidity at 09LT | % | - | |
Relative_Humidity_2m_12h | Relative humidity at 12LT | % | - | |
Relative_Humidity_2m_15h | Relative humidity at 15LT | % | - | |
Relative_Humidity_2m_18h | Relative humidity at 18LT | % | - | |
Snow_Thickness_LWE_Mean | Snow liquid water equivalent (00-00LT) | cm of liquid water equivalent | Mean | |
Snow_Thickness_Mean | Snow depth (00-00LT) | cm snow | Mean | |
Solar_Radiation_Flux | Surface solar radiation downwards (00-00LT) | J m-2d-1 | Sum | |
Temperature_Air_2m_Max_24h | Maximum air temperature at 2 meter (00-00LT) | K | Maximum | |
Temperature_Air_2m_Max_Day_Time | Maximum air temperature at 2 meter (06-18LT) | K | Maximum | |
Temperature_Air_2m_Mean_24h | 2 meter air temperature (00-00LT) | K | Mean | |
Temperature_Air_2m_Mean_Day_Tim e |
| K | Mean | |
Temperature_Air_2m_Mean_Night_Ti me |
| K | Mean | |
Temperature_Air_2m_Min_24h | Minimum air temperature at 2 meter (00-00LT) | K | Minimum | |
Temperature_Air_2m_Min_Night_Time | Minimum air temperature at 2 meter (18-06LT) | K | Minimum | |
Vapour_Pressure_Mean | Vapour pressure (00-00LT) | hPa | Mean | |
Wind_Speed_10m_Mean | 10 meter wind component (00-00LT) | m s-1 | Mean |
3.3. Product target requirements
DATA DESCRIPTION | |
Horizontal coverage | Global (on a regular latitude-longitude grid) |
Temporal Coverage | 1 January 1979 to present |
Temporal resolution | Daily |
File format | NetCDF 4, Climate and Forecast (CF) Metadata Convention v1.6 |
Data type | Grid |
Horizontal resolution | 0.1° x 0.1° |
Some remarks related to the quality of the AgERA5 data set:
- The spatial resolution of the downloaded ERA5 was selected such that it is as close as possible to the native resolution of ERA5. Therefore the original ERA5 model level data (reanalysis-era5-complete; 0.28125°) was downloaded from the CDS instead of the interpolated version (interpolated to a 0.25° grid).
- The applied aggregation zone definitions work very well with the local time zones of West- and East-Europe and mostly for the North-American continent. For Asia there is a shift of 2-3 hours between the actual local time definition and the definition in our study. The only extreme mismatch of the local time definitions will happen eastward of the dateline in zone E4. Fortunately, the affected areas (Pacific islands and the very western coast of Alaska) are, from an agricultural perspective, not particularly significant.
- A location, variable and season specific bias correction towards the HRES operational model was applied. This way the finer topography, finer land use pattern and finer land-sea delineation of the HRES operational model is more or less included in the downscaled ERA5. In fact, the ERA5 data set was tuned to the detailed topography of the HRES operational model also leading to more consistent time series between ERA5 and the HRES operational model.
- AgERA data are represented by the elevation model of the HRES operational model (period 2016-04-01 and 2018-03-31; HRES model cycles 41r2, 43r1 and 43r3) at a spatial resolution of 0.1 degree.
The quality of the bias correction has been documented in a separate document named: "C3S422Lot1.WEnR.DS2_Downscaling and bias correction v1.7.pdf". For each of the 22 variables there are netCDFs available describing the performance of the bias correction in terms of the following sample statistics: MAE, RMSE, R-squared and bias:
- models_Cloud_Cover_Mean_Model2SelSeasonals_eval.nc
- models_Dew_Point_Temperature_2m_Mean_Model2SelSeasonals_eval.nc
- models_Relative_Humidity_2m_06h_Model2SelSeasonals_eval.nc
- models_Relative_Humidity_2m_09h_Model2SelSeasonals_eval.nc
- models_Relative_Humidity_2m_12h_Model2SelSeasonals_eval.nc
- models_Relative_Humidity_2m_15h_Model2SelSeasonals_eval.nc
- models_Relative_Humidity_2m_18h_Model2SelSeasonals_eval.nc
- models_Solar_Radiation_Flux_Model2SelSeasonals_eval.nc
- models_Temperature_Air_2m_Max_24h_Model2SelSeasonals_eval.nc
- models_Temperature_Air_2m_Max_Day_Time_Model2SelSeasonals_eval.nc
- models_Temperature_Air_2m_Mean_24h_Model2SelSeasonals_eval.nc
- models_Temperature_Air_2m_Mean_Day_Time_Model2SelSeasonals_eval.nc
- models_Temperature_Air_2m_Mean_Night_Time_Model2SelSeasonals_eval.nc
- models_Temperature_Air_2m_Min_24h_Model2SelSeasonals_eval.nc
- models_Temperature_Air_2m_Min_Night_Time_Model2SelSeasonals_eval.nc
- models_Vapour_Pressure_Mean_Model2SelSeasonals_eval.nc
- models_Wind_Speed_10m_Mean_Model2SelSeasonals_eval.nc
Overall, the temperature, humidity and wind speed variables benefit most from the correction. The MAE is reduced by 30% to 60% in the majority of cases. Grid points being located in mountainous areas or along coasts and lakes are improved most. This is not surprising as these are the areas where the largest systematic differences between ERA5 and HRES can be expected. But not only the relative improvements are quite large, also the absolute MAE values after the correction are small. The MAE for the 24h mean of the 2m temperatures (2t_davg) for example is for all continents below 0.72K, and for 4 of 6 continents even below 0.51K.
For the solar radiation flux (ssrd_dsumdiff) the MAE improvement is solid and ranges between 2% and 14%, depending on the region and subset. The results of element "24h mean cloud cover" (tcc_davg) are mixed. For most grid points the correction doesn't add any value. The MAE improvement of the majority of all grid points (land and below 800m) is between -2% and +4%, and therefore near zero. Only for grid points above 800m we can observe a small but clear improvement (2% - 8%).
The following conclusions were drawn from the evaluation study:
- The selected bias correction method has its largest benefits in mountainous areas, at coast lines and at lakes.
- Seasonal correction on top of the simple bias correction further improves the accuracy of the derived correction equations.
- The approach works remarkable well for 3 out of the 4 groups of variables. The averaged relative reduction of MAE is between 30% and 60%. These are:
- Temperature parameters
- Humidity parameters
- Wind speed
- The correction models for solar radiation flux reach a MAE improvement of 2% to 14%.
- For cloud cover the correction has only a minor effect for most of the grid points. However, mountainous regions still benefit from the correction with a MAE improvement of 2%-8%.
The correction towards the HRES operational model is very relevant for users that do near real time monitoring of growing conditions and agricultural production. Note that the final ERA5 product will come available with a time lag of one week including the temporary ERA5 line. For monitoring systems like JRC's Monitoring Agricultural ResourceS (MARS) such time lag is too large and therefore data in such systems have to be completed with data from the HRES operational model. When combining data of two datasets, originating from difference resolutions, biases might be introduced that negatively affect the monitoring performance. This can be avoided by correcting the ERA5 towards the HRES operational model. Similar reasoning applies to forecast products like the ENS forecasts (15/30 day ensemble forecasts). This product can also be downscaled and bias corrected towards the HRES operational model. This way more or less consistent time series are obtained linking reanalysis, HRES and ENS data all around a common 'HRES' reference. Some remarks:
- To improve the timeliness of the foreseen service the preliminary ERA5 product, ERA5t, needs also to be processed. We hereby assume that the bias correction algorithms, which are based on ERA5 data, can also be applied on ERA5t data.
- Specifically for users that need to link ERA5 to HRES for NRT monitoring purposes the following issue is relevant. The merge with the HRES operational model would need an additional service relying on specific data contracts with ECMWF. And the HRES operational model data must be processed in a similar way (daily aggregation, possibly elevation corrections etc.) as the ERA5 data.
- Note that the HRES model is constantly improving (improved model physics, increased spatial resolution etc.). Therefore, with each additional HRES model upgrade, the established statistical relationship between ERA5 and HRES will become less valid. Over time, this may lead to jumps in the time series as the bias correction is correcting for aspects that changed in the HRES model. In such case users, that link ERA5 to HRES, need to be warned and eventually the bias correction needs to be updated.
3.4. Product Gap analysis
Currently the AgERA5 is available for the years 1979 to present as the remaining ERA5 dataset from 1950 to 1978 was not available during the project.
The AgERA5 was bias corrected towards the HRES operational model. It is assumed that the HRES operational model reflects reality best as it is based on an advanced assimilation and spatialization scheme using many quality-controlled observations, satellite imagery, weather balloons etc. However, the AgERA5 data set does not yet represent the agricultural regions within a 0.1 degree grid cell. This requires an extra elevation correction for temperature (average, minimum and maximum air temperature using a lapse rate of 6.5 °C/km) and possibly for humidity. The elevation of agricultural regions could be defined as follows. For example if a 0.1 degree grid cell has more than 5% arable land the elevation could be calculated as the median of all DEM pixels within that grid cell that are under arable land. In case there is less than 5% arable land within the grid cell, the elevation could equal the lower quartile of all DEM pixels under the complete grid cell.
4. Data usage information
4.1. Practical usage considerations use of products
The AgERA5 data set is very much suited to study all aspects of productivity and externalities of agricultural production over the period 1979 to present. The available variables match input needs of most crop growth models like CGMS- WOFOST, EPIC-BOKU, EPIC-IIASA, EPIC-TAMU, GEPIC, LPJ-GUESS, LPJmL, pAPSIM, pDSSAT, PEGASUS, PEPIC, PRYSBI2 etc.
As an example CIMMYT used AgERA5 data for the following objectives:
- Classification of environments. In breeding programs usually environments (location-year-management combination) are classified as Drought, Optimum, Random Drought, based basically on water regime. However, in non-irrigated experiments, water availability depends on precipitation that can be different from environments to environment even when all of them are called "Drought".
- New traits. Crop stages can now be described not only by duration days as days to heading, days to maturity; instead, we can use received total radiation in each period, degree days, and other traits that are use of daily data.
- Understanding environmental effect. Several new environmental variables can be obtained from daily data and their effects on response traits and genotypes effects can be studied. For example, we can use the maximum temperature at different crop stages, not only maximum temperature during the complete crop cycle.
JRC-MARS reckons AgERA5 as a candidate to replace their ERA-Interim data set in support of their crop yield forecasting and monitoring activities.
4.2. Known Limitations of product
See section 3.4.
5. Known issues
5.1. Rogue values in Tmin-24h layers
5.1.1. Background
Several users have reported erroneous temperature values in the Tmin-24h variable where the value for selected grid cells could reach unrealistic values of around 220 K (-50 C) in locations with otherwise high temperatures. Analysis of the spatial distribution demonstrated that the cells with erroneous values can often be found in Western Australia but are not limited to that region and can be found in other parts of the World as well (Figure 1).
Figure 1: Maps of AgERA5 Tmin-24h with rogue values (black cells) for several regions in the World
A further analysis on the occurrence of the rogue Tmin-24h values demonstrates that the problem occurs quite often. Figure 2 shows the number of files per year where such rogue values occur in the Tmin-24h variable. Note that files with rogue Tmin-24h values cannot be found by looking at the temperature extremes because the low Tmin-24h values are still within the valid range. E.g. an erroneous value of 220 K in Western Australia in Summer cannot be discriminated from a valid temperature value of 220K that occurs in Eastern Siberia at the same day. Instead, figure 2 was generated by computing the first order spatial differences and selecting on a threshold value.
Surprisingly there are large differences between the different time-periods: the problem hardly occurs with the 1979-1999 time period, quite regularly in the period 2000-2020 and often since 2021. These time periods coincide with the batches in which the AgERA5 archive has been processed. The origin of the differences is not entirely clear but could be related to different encodings of the original ERA5 input data.
Figure 2: Number of Tmin-24 files per year where the problem of rogue values occurs
5.1.2. Problem analysis
To find the origin of the problem it is needed to dive deep into the processing chain used for AgERA5 and the structure of the ERA5 files used as input for AgERA5. First of all, a feature of the ERA5 input files is that the content of each file does not contain the data for 00:00 to 24:00 UTC. For example, the ERA5 file containing air temperature data for 2024-01-01 contains data ranging from 2024-01-01T07:00:00 up till 2024-01-02T06:00:00. Therefore, the AgERA5 processing line first harmonizes all data files so they contain the time slices for the period 00:00 to 24:00 UTC. For example, for harmonizing the data for 2024-01-02 the processing line takes the files for 2024-01-01 and 2024-01-02, opens them jointly with xarray and takes the slice out of the dataset covering 2024-01-02T00:00:00 <= time < 2024-01-03T00:00:00. Analysis of the processing line of AgERA5 looking specifically what happens at those rogue Tmin-24h values demonstrated that the problem is generated at the step when two ERA5 values are joined (see Figure 3).
Figure 3: Above: A timeseries of the original hourly ERA5 input data for variable MN2T (minimum temperature). Below: hourly ERA5 data clipped to 0-24UTC period. The grey area show the process of taking 2 ERA5 files and combining them into one new file during which the rogue Tmin-24h values are generated.
A second point that should be understood is that figure 3 shows the ERA5 data as floating point values in degrees Kelvin. However, that is not how the data is stored on disk. The raw data coming from the ERA5 processing chain is not stored as floating point values (a 32 bit single-precision float) but instead as a C short datatype (a signed 16 bit integer) with an offset and a scaling factor associated with the variable as attributes. You can find out when looking at the data in Panoply. The AgERA5 Tmin-24h is derived from the ERA5 variable "mn2t" and panoply shows the scale_factor and offset values (figure 4).
Figure 4: Encoding of the variable mn2t in ERA5 input files.
Tools like panoply and xarray handle this completely transparent on the background: they recognize the offset and scale_factor and convert back and forth. Moreover, the scale_factor and offset are highly optimized values: each ERA5 file has its own scale_factor and offset in order to maximize the precision for the given data range.
The tricky part is when the newly sliced dataset has to be saved into a new NetCDF file which combines data from 2024-01-01 and 2024-01-02 (figure 3). Under the hood, xarray still knows that this data is represented by a C short with a scale_factor and offset, the question is now which scale_factor and offset to apply? The one for 2024-01-01 or the one for 2024-01-02? Xarray applies the scale_factor and offset from the first file it opens, so 2024-01-01 in this case.
The location in time where things go wrong with the variable "mn2t" is marked with the square on the red curve in the figure 5 below. It is the first slice of the second input file (red line) and xarray is applying the scale_factor and offset of the first input file (green line) to save a new NetCDF file.
Figure 5: As top figure 3, but with the input value that turns rogue marked with a black diamond.
The temperature value that turns rogue is the first data point on the red curve (Figure 5) whose actual value is 318.871307 K and we convert it to 16 bit integer by inverting the scale and offset of the NetCDF file represented by the green curve (Figure 4):
>>> math.trunc((318.871307 - 268.68162880225003)/0.0015210688706274414) 32996
And this is what happens with the last data point on the green curve whose values is 318.309113 K
>>> math.trunc((318.309113 - 268.68162880225003)/0.0015210688706274414) 32626
But the maximum value that can be stored in a signed 16 bit integer is 32767. So the first data point on the red line (marked with the diamond) is too large to fit in the range represented by a 16-bit integer because the scale_factor and offset are not representative. The last data point on the green line just fits as it remains below 32767.
Thus, the rogue Tmin-24h values come from an integer overflow case. Unfortunately, integer overflow does not generate any errors as it just rolls over towards the negative side of the signed integer (starting from -32768) so it is hard to detect. Fixing it in terms of software is relatively easy: we just have to force xarray to write NetCDF files with single precision floats instead of 16-bit integers. This takes twice as much disk space but these are only temporary files so that won't matter. Fixing it in terms of data is more tricky: a complete reprocessing of AgERA5 will be required.
Since 10 March 2024, the processing line has been updated in order to avoid this problem. However, fixing the issue in the full AgERA5 dataset will require a reprocessing of the archive. We are currently investigating what the consequences are and if a full reprocessing is achievable.
The consequences of the erroneous values for the fitness for purpose of the AgERA5 dataset are small. The cells with erroneous values are mostly located in deserts and other extremely warm areas which are usually not used for agriculture. Nevertheless such errors are undesirable and should preferably be fixed.
5.1.3. Impact on AgERA5 variables
All the input variables that are taken from ERA5 are stored as 16-bit C short datatypes and therefore the problem of integer overflow (and underflow!) could happen for any ERA5 input variable that is used for generating AgERA5. Nevertheless, the impact on different AgERA5 variables is different. Below there is a expert assessment on the impact of the different variables.
Temperature
- Temperature variables that take the min or max of a time-slice can be affected directly depending on whether the rogue values is within the selection window (24h or day/night time)
- Assuming a maximum differences of 90 K for a rogue Temperature value, the temperature variables that are based on the mean are affected by a maximum of ~4 degrees K (90 K / 24 timesteps = 3.75 K) for 24h mean values or ~7.5 degrees K (90 K / 12 timesteps = 7.5 K)
Precipitation
- Precipitation is affected slightly because the sum of all 24h values is taken. However, an overflow will turn a precipitation value for a single 1h time slice into a near-zero precipitation. The impact of this will not be noticeable due to the variable and erratic nature of precipitation
Global radiation
- Global radiation is affected slightly. However, an overflow will turn a radiation value for a single 1h time slice into a near-zero radiation. The impact of this will not be noticeable due to the natural variability of radiation.
Windspeed:
- Windspeed is hardly affected because an overflow or underflow will generate a windspeed in the opposite direction but at similar magnitude. The windspeed in AgERA5 is computed as the square root of the sum of the squared windspeeds in u and v direction. Therefore an over- or underflow will not cause a large difference in daily mean windspeed.
Humidity and vapour pressure:
- Individual humidity values could be affected when they coincide with a particular slice that is affected by rogue temperature values. Given that humidity is constrained between 0 and 100 % the impact is limited.
- Vapour pressure is computed as the mean of 24 timesteps and is therefore the impact is limited.
Snow thickness and LWE:
- Snow variables are calculated as the mean of 24 time slices and therefore the impact will be limited.
Precipitation type:
- Precipitation type is based on a count of the different time steps and is therefore only in a limited degree affected.
5.2. Striping in the AgERA5 relative humidity layers
5.2.1. Background
Users have reported stripes (discontinuities) in the relative humidity product layers which are part of AgERA5. In figure 1 the humidy at 18:00 local time for 2024-03-16 is shown and the discontinuities in the product have been marked by the red arrows at the top of the image.
Figure 1: Relative humidity around 18:00 local time for 2024-03-16
In fact, these discontinuities correspond to the processing windows that the AgERA5 processing software uses to convert the hourly ERA5 input data into a daily product. For each window, the software uses a different 24 hour slice of the hourly data which best corresponds to the local time zone. This is illustrated in figure 2, here the solar elevation is shown for three locations, Beijing, London and Los Angeles with time in UTC on the x-axis. It demonstrates that the course of the solar elevation (the daylight cycle) shifts according to the location on Earth: in Being it moves to earlier UTC time (daylight starts around 21:00 UTC of 5 June). The daylight cycle for London is exactly centred at 12:00 UTC. While for Los Angeles it moves to later UTC time given its location to the West. The coloured bars on the top figure 2 indicate the 24-hour slice that is selected from the hourly ERA5 input data. Within the AgERA5 processing chain there are 8 windows defined for which a dedicated 24-hour slice is selected and when we move from one window to the next the slice shifts by 3 hours.
However, this also means that at the edges of each window we may get effects due to the difference in the slice of ERA5 inputs. For most AgERA5 layers this effect is small because we are computing an average, sum, maximum or minimum over the 24 ERA5 layers, e.g. 24h maximum temperature or daily solar radiation sum and we hardly see it in the resulting product. However, for the relative humidity products that provide an estimate of relative humidity at a particular local time, edge effects may occur because the time for which the value is computed shifts by three hours.
Figure 2: Solar elevation for three locations on Earth as a function of UTC time. The horizontal bars indicate the slice of the 24 hourly ERA5 inputs that are selected for generating the AgERA5 product for 2017-06-06.
5.2.2. The problem
Relative humidity is not a conservative property because it varies with temperature. For a given vapour pressure, the relative humidity decreases with increasing temperature simply because warm air can take up more moisture and therefore humidity decreases in relative terms.
For diagnosing problems in AgERA5 it is therefore better to look at the vapour pressure (figure 3). Here it is obvious that the discontinuities are gone. The vapour pressure is a smooth field indicating that the underlying data about atmospheric moisture is fine. The discontinuities are therefore part of the dataset and are not a problem with the processing chain.
Figure 3: Daily average vapour pressure at 2024-03-16.
5.2.3. Looking into the actual values
In figure 4 we zoom into an area in Southern Africa. The map shows the discontinuity that runs through the scene which is marked again by the red arrow. We can now look at the values for vapour pressure and humidity for a point left (blue dot) and right (red dot) of the discontinuity. The vapour pressure values for both point are nearly identical: 21.54 and 21.45 hPa (Table 1). The humidity values for the corresponding local times (6:00, 9:00, 12:00, 15:00 and 18:00 local time) do show different values (table 1).
Figure 4: Relative humidity at 18:00 local time for a region in Southern Africa.
However, after looking closely it can be observed that the values are actually shifted by three hours: the 06h estimate for the blue point matches with the 09h estimate of the red point, while the 09h estimate matches with the 12h, etc. Figure 5 confirms the shift. This shift is caused by the impact of selecting a different slice of hourly temperature values from ERA5 between the red and the blue point.
Table 1: Values for vapour pressure and humidith at local time for the selected points.
Blue dot | Red dot | |||||
latitude | 1.14 | latitude | 1.14 | |||
longitude | 22.4 | longitude | 22.5 | |||
Vapour pressure | 21.54 | hPa | Vapour pressure | 21.45 | hPa | |
06h | 88.05 | % | 06h | 89.1 | % | |
09h | 61.3 | % | 09h | 87.3 | % | |
12h | 43.8 | % | 12h | 60.3 | % | |
15h | 56.9 | % | 15h | 43.1 | % | |
18h | 69.7 | % | 18h | 54.7 | % |
Figure 5: Relative humidity values throughout the day for the two points on opposite sides of the discontinuity.
5.2.4. The solution
Within the current setup of the AgERA5 processing chain there is no solution to solve the discontinuities in the relatively humidity. Moreover, occasionally such discontinuities are visible in other AgERA5 products as well although not as dramatic as in the relative humidity. The only solution is to create a more fine-grained temporal subsetting. Currently, the AgERA5 processing works in 8 windows with a 3-hourly shift. A more fine grained subsetting could use 12 windows with a 2-hourly shift or even 24 windows that shift one hour. The latter would make optimal use of the hourly ERA5 inputs.
However, one should also consider whether these discontinuities in the humidity layers of AgERA5 are actually problematic for applications. The background of these humidity fields is that they were added to AgERA5 for running pest & disease models. Particularly fungal diseases are dependent on the leaf wetness duration for which relative humidity is a key parameter. But if those humidity values are shifted by three hours that will not matter much for those models. There may be other applications that are more critical but currently we not aware of anything.
We are currently exploring if a reprocessing of AgERA5 could make use of hourly or 2-hourly windows, or whether this will be a recommendation for a future AgERA6.