The ERA5 hourly and monthly data are made available with a 3 month delay. This means that after a month has passed, another month's worth of ERA5 data is written to the dataset.
ERA5T (near real time) preliminary data are used to fill the gap between the end of the ERA5 data and 5 days before the present date. The oldest month of these is overwritten each month as new ERA5 data become available.
So as an example, say we have a current date of 15th February 2020:
- ERA5 data are currently from 1/1/1979 - 30/11/2019 (instantaneous variables) and 1/1/1979 - 1/12/2019 (00-06 UTC, accumulated variables)
- ERA5T data (with a 5 day delay) are from 1/12/2019- 10/2/2020 (instantaneous variables) and 1/12/2019 (07-23 UTC, accumulated variables)- 10/2/2020
For requests which return a mixture of ERA5 and ERA5T data (such as for data from the 1st of the month), instantaneous variables (e.g temperature) come from ERA5T (which has 'experiment version' of 5) while accumulated variables (fluxes, precipitation) come from both datasets with the following structure:
- 00-06 UTC on 1 day of the month from ERA5 (expver 1)
- 07-23 UTC on 1 day of the month (and the following dates up to 5 day from present) from ERA5T (expver 5)
When these data are converted to netCDF a new dimension is created called expver containing 1 and 5. Moreover, a single time coordinate is used which covers the entire requested period.
dimensions: longitude = 1440 ; latitude = 721 ; expver = 2 ; time = 24 ; variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int expver(expver) ; expver:long_name = "expver" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:00.0" ; time:long_name = "time" ; time:calendar = "gregorian" ; short tp(time, expver, latitude, longitude) ; tp:scale_factor = 9.06276558810304e-07 ; tp:add_offset = 0.0296950577259784 ; tp:_FillValue = -32767s ; tp:missing_value = -32767s ; tp:units = "m" ; tp:long_name = "Total precipitation" ; data: expver = 5, 1 ; ... }
Both expver dimensions use the full time extent of time coordinate but the expver 1 data only covers the first 7 timesteps, the remaining timesteps are 'padded' with empty fields.
For the expver 5 data, the first 7 timesteps are padded with empty fields, with the remaining timesteps coming from the ERA5T data.
When the last ERA5 data are released, they will overwrite the ERA5T data for the entire month and for accumulated variables for 00-06 in next month. This process will be repeated each month.
Notice for the time being, if you download only ERA5, or ERA5T, the above mentioned dimension 'expver' will not appear. This makes it difficult to tell the difference between ERA5 and ERA5T.
It seems that if one requests hourly total precipitation ERA5 data for 1 January 2020, the file contains both expver versions (1 and 5) and the file size is doubled (around 99 MB). For other days, the expver dimension does not appear.
Thank you for reporting this, Julia. We are looking into a long term solution now. Unfortunately it will take some time.
As pointed out above, only mixed ERA5/ERA5T data has 'expver'. When users consider accumulated variables the file has "the following structure:
So for your case, data for 00-06 UTC of 1 January 2020 is ERA5 while the rest of data are ERA5T. Data for 2 January 2020 are only ERA5T so 'expver' does not appear. Moreover, please pay attention "Both expver dimensions use the full time extent of time coordinate but the expver 1 data only covers the first 7 timesteps, the remaining timesteps are 'padded' with empty fields." This means that the empty fields contain NaN values.
Hello Michela and Xiaobo
I can see why you want to keep two expver but I think it's making things more complicated than needed for users. Moreover, the introduction of the two experiments is breaking codes, with consequent time loss trying to first identify the issue, and then find a (not-so-striaghtforward) solution. My suggestion would be to get rid of the two expver and just communicate when changes to ERA5T, when they become ERA5, are made, as anyway already indicated in Release of ERA5T
Could this solution – i.e. merging expver 1 with 5, so no expver dimension/parameter appears in the retrieval – be implemented please? I think it'd be much cleaner if this was done at your end.
Thank you very much
FYI Alberto, we are thinking about to have 'expver' as a dimension for all ERA5 and ERA5T data.
I have the same issue as Julia Wagemann when downloading SurfaceSolarRadiation for all available 2020 timesteps, the first six hrs of 01/01/2020 are expver = 1, but the rest of the 2020 timesteps are expver = 5
Yes, this is because Surface Solar Radiation is an accumulated parameter and January is a month with ERA5 and ERA5T mixed data. For these reasons, the file has the following structure:
So also in your case, data for 00-06 UTC of 1 January 2020 is ERA5 while the rest of data are ERA5T. Data for 2 January 2020 are only ERA5T so 'expver' does not appear. Moreover, please pay attention to the empty fields which contain NaN values. This happens because "Both expver dimensions use the full time extent of time coordinate but the expver 1 data only covers the first 7 timesteps, the remaining timesteps are 'padded' with empty fields. For the expver 5 data, the first 7 timesteps are padded with empty fields, with the remaining timesteps coming from the ERA5T data."
Semi-related to this topic... Is there any documentation for why the most recent data for ERA5T instantaneous variables are available only from 0 - 21Z for the most recent available day, and the accumulated variables are available through 06Z the following day?
I understand this is the best we can do for the time being.
Our technical team commented: "the accumulated fields are forecast fields from forecast starting at 18h , while the instantaneous fields are analysis fields from the 9-21h assimilation window."
I see, thank you for the quick response. And are the accumulated and instantaneous data released through CDS at 18Z and 21Z, respectively, or is there some specific lag (computational) time for each?
Luiz Angelo Steffenel
Did someone found a simple way to get rid of expver dimension, or at least to filter out Era5 and Era5T data on mars scripts? As Alberto told, this is breaking a lot of codes. In my case, I download Ozone Total Column in a "monthly" base to automatically generate maps with NCL. However, the presence of expver adds a dimension that NCL can't understand, and I can't get rid of it (could not find an easy way: I can remove the expver variables but the dimension remains).
Hi, is there a solution to removing the expver dimension? I have the same situation with the November, 2020 data and it is messing up all my codes which works perfectly on data from 2003 till date. I will really appreciate the help if anyone knows a way to achieve this. Thanks
At the moment i think the easiest way is to retrieve the ERA5 and ERA5T data in separate requests, by careful selection of the dates. In this way you would get 2 netCDF files without the 'expver' dimension which you can then merge if required,
If you are looking for a Python workaround, you can use Xarray function reduce(np.nansum, 'expver'). In this way you can collapse the dimension summing each other the two expver arrays, that perfectly match (the one is NaN when the other got a value). I know that it isn't politically correct, but with 1 row you avoid tons of code stop working.
Thank you all for the helpful responses. I appreciate the suggestion marco venturini. That is a solution I can work around.
cdo --reduce_dim -copy in.nc out.nc
worked well for me and it removed expver dimension
this will do the trick
import xarray as xr
ERA5 = xr.open_mfdataset('era5.tp.20200801.nc',combine='by_coords')
Thank you for sharing, Jian Tang. It worked smoothly and seamlessly. You just save me from days of sleepless nights.
Just downloaded ERA5 data (ssrd) for March 2022 and the "expver" seems to be gone. Is this going to be gone for good? The problem is that when the number of dimensions vary over time, it is difficult to find a coherent way to process the data.
March 2022 has no "expver" because there are only ERA5T data.
How come? I thought there was only a 5 -day lag or am I misunderstanding something here?
I'm looking at a 10m u component of wind .nc I just downloaded for 2022. It has the expver dimension all right. But the values aren't all NaN. The expver 5 values start as NaN. Then at 2022-02-13 02:00 the values all become -1.7. They stay that way until they transition to sensible values at 2022-05-01 00:00. Then the expver 1 values become -1.7.
I guess my question is how can I tell when to use the value from expver 1 and when to use expver 5? Detecting NaN seems to not be sufficient.
hi John, Can you share the request you used to retrieve the data, please?
Hi, try this. The transition from expver 1 to 5 in the resulting netcdf file was at time index 3625. I used Panoply to view it.
'month': ['1', '2', '3', '4', '5', '6', '7'],
'day': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31'],
Thanks for reporting this; I think this may be an issue with the grib to netCDF converter due to the size of the request. In this case, if requesting netCDF it may be better if you just request 1 month at a time (although months containing a mix of era5/era5T will need careful handling as only these will have the 'expver' dimension)
Hope that helps,
expver is used to tell the difference between the initial release (expver=5, called ERA5T) and validated ERA5 data (expver=1). See the link below for details.
ERA5: data documentation#Dataupdatefrequency
In most cases, ERA5 is identical to ERA5T. Therefore, if you spot any unusual behaviour, please let us know.