Please be aware that the generation of this dataset is using an alternative source for the ERA5-land data and may be subject to changes over time (e.g. file format, data file structure, deprecation etc). This dataset should therefore be regarded as “experimental” and is not recommended for use in a production environment.

Notification of changes via the catalogue entry warning banner and/or in the Forum will be provided on best efforts.

Introduction

ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to ERA5. It covers the period from January 1950 to the present with hourly sampling and continues to be extended forward with 5 days behind real time.

This dataset presented here is a subset of some parameters of the full CDS ERA5-Land single levels hourly dataset on 0.1 degrees resolution and is stored in Analysis Ready Cloud Optimised (ARCO) format, which has been implemented for retrieving long time-series for a single point in an efficient way.

Methodology

The process to generate this dataset involves:

Fetching ERA5-Land data in GRIB format.
Apply homogenization conventions, as applicable:
- Rename spatial coordinates to latitude, longitude.
- Ensure latitude, longitude ranges are between [-90, +90], [-180, +180].
- If the number of hourly samples is incomplete (less than 24) the date is skipped (e.g. 1950-01-01).
- Time is expressed in one single dimension called time, which in ERA5-Land netCDFs is usually called valid_time.
- Drop "number", "step", "surface", "valid_time" and "depthBelowLandLayer".
- The accumulation variables (e.g. Total precipitation) are de-accumulated to hourly resolution. Please see the Jupyter notebook below.
Written to an ARCO Zarr archive (which is similar in structure to netCDF).
The CDS requests the data from the Zarr archive using xarray and writes the data to netCDF or CSV (as requested).

Jupyter notebook demonstrating the procedure to de-accumulate the data

De-accumulation is the process of substracting to each hourly time sample its previous sample so that the final value expresses only the magnitude for that hour, and not for that hour plus the previous ones. The process takes into account the accumulation period of 24 hours, so that the first sample of every period is left as it is.

ARCO ERA5-Land de-accumulation

In this notebook, we show the procedure to de-accumulate the data of an accumulated ERA5-Land variable, then show how to accumulate it back and later prove the procedure by reconstructing the data and comparing it against the original one.

from copy import deepcopy

import xarray as xr
import numpy as np
import cdsapi

Definitions

def flatten_dataset(ds, var, time_selection=[]):
    """Rearange the data so we flatten time samples.
    E.g. from
    {'time': 2, 'step': 24, 'latitude': 1801, 'longitude': 3600}
    to
    {'time': 48, 'latitude': 1801, 'longitude': 3600}
    """
    total_samples = ds.valid_time.shape[0]*ds.valid_time.shape[1]
    
    reshaped_var = ds[var].values.reshape((total_samples, ds[var].shape[2], ds[var].shape[3]))

    ds_new = xr.Dataset(
        data_vars={var: (["time", "latitude", "longitude"], reshaped_var)},
        coords={
            "time": ("time", ds.valid_time.values.reshape((total_samples))),
            "latitude": ("latitude", ds.latitude.values),
            "longitude": ("longitude", ds.longitude.values),
        },
    )

    ds_new.attrs = ds.attrs
    ds_new[var].attrs = ds[var].attrs
    ds_new.longitude.attrs = ds.longitude.attrs
    ds_new.latitude.attrs = ds.latitude.attrs
    ds_new.time.attrs = ds.valid_time.attrs
    if time_selection is not None:
        ds_new = ds_new.isel(time=slice(*time_selection))

    # Remove additional variables, same as in ARCO hourly ERA5 on single levels
    for drop_var in ["number", "surface"]:
        if drop_var in ds_new:
            ds_new = ds_new.drop_vars([drop_var])

    return ds_new

def apply_lon_lat_conventions(ds):
    # Renames
    if "lon" in ds.dims:
        ds = ds.rename({"lon": "longitude"})
    if "lat" in ds.dims:
        ds = ds.rename({"lat": "latitude"})

    # Flip latitudes (ensure they are monotonic increasing)
    if "latitude" in ds.dims:
        lats = ds["latitude"]
        if len(lats) > 1 and lats[0] > lats[-1]:
            ds = ds.reindex(latitude=ds.latitude[::-1])

    # Convert longitude to [-180, 180[
    if "longitude" in ds.dims and ds["longitude"].max() > 180:
        lons = ds["longitude"]
        lons_attrs = lons.attrs
        new_lons = np.concatenate([lons[lons >= 180], lons[lons < 180]])
        ds = ds.reindex(longitude=new_lons)
        ds = ds.assign_coords(longitude=(((ds["longitude"] + 180) % 360) - 180))
        ds["longitude"].attrs = lons_attrs

    return ds

Data

variable = "tp"
# CDS API

token = "" # Your CDS API Token

dataset = "reanalysis-era5-land"
request = {
    "variable": ["total_precipitation"],
    "year": "1970",
    "month": "01",
    "day": ["01"],
    "time": [
        "00:00", "01:00", "02:00",
        "03:00", "04:00", "05:00",
        "06:00", "07:00", "08:00",
        "09:00", "10:00", "11:00",
        "12:00", "13:00", "14:00",
        "15:00", "16:00", "17:00",
        "18:00", "19:00", "20:00",
        "21:00", "22:00", "23:00"
    ],
    "data_format": "grib",
    "download_format": "unarchived"
}

client = cdsapi.Client(url="https://cds.climate.copernicus.eu/api", key=token)
client.retrieve(dataset, request).download("cdsapi_era5-land-date-1.grib")

request["day"] = ["02"]
client.retrieve(dataset, request).download("cdsapi_era5-land-date0.grib")

ds0 = xr.open_dataset("cdsapi_era5-land-date0.grib", filter_by_keys={'stepType': 'accum'})
ds_1 = xr.open_dataset("cdsapi_era5-land-date-1.grib", filter_by_keys={'stepType': 'accum'})

ds0 = apply_lon_lat_conventions(ds0)
ds_1 = apply_lon_lat_conventions(ds_1)

De-accumulation

# Deaccumulate
# We want to finally obtain a Dataset that contains 24 samples, them being:
# [
#    00:00    <date-1, step24> - <date-1, step23>,
#    01:00    <date0, step1>,
#    02:00    <date0, step2> - <date0, step1>,
#    03:00    <date0, step3> - <date0, step2>,
# ...
#    23:00    <date0, step23> - <date0, step22>
# ]
def deaccumulate(ds_1, ds0, variable):
    """ De-accumulate the variable of one Dataset
    
    Args:
        ds_1: Date -1 (e.g. 1970-01-01) dataset.
        ds0: Date 0 (e.g. 1970-01-02) dataset.
        variable: The dataset variable
    
    Returns:
        A de-accumulated Dataset.
    """
    ds_1_flat = flatten_dataset(ds_1, variable, None)
    ds0_flat = flatten_dataset(ds0, variable, None)
    ds_deaccumulated = deepcopy(ds0_flat)
    
    # Array indexes
    # ds-1, date-1, step23
    i_ds_1_d_1_s23 = 46
    # ds0, date-1, step24
    i_ds0_d_1_s24 = 23
    # ds0, date0, step1
    i_ds0_d0_s1 = 24
    
    # 00:00    <date-1, step24> - <date-1, step23>
    ds_deaccumulated[variable][i_ds0_d_1_s24] -= ds_1_flat[variable][i_ds_1_d_1_s23]
    
    # Nothing to do for 01:00    <date0, step1>,
    
    # From
    #    02:00    <date0, step2> - <date0, step1>,
    # to
    #    23:00    <date0, step23> - <date0, step22>
    for i, t in enumerate(range(ds_deaccumulated.time.shape[0])):
        if i > i_ds0_d0_s1:
            ds_deaccumulated[variable][t] -= ds0_flat[variable][t - 1]
    
    # Slice to the times of interest (24 samples)
    ds_deaccumulated = ds_deaccumulated.isel(time=slice(i_ds0_d_1_s24, i_ds_1_d_1_s23 + 1))
    
    return ds_deaccumulated

ds_deaccumulated = deaccumulate(ds_1, ds0, variable)

Accumulation

# We revert de de-accumulation to obtain the original values

def accumulate(ds0_deaccumulated, variable, ds_1_accumulated=None, ds_1_deaccumulated=None):
    """Accumulate a variable over each time sample.
    
    Args:
        ds0_deaccumulated: Date 0 (e.g. 1970-01-02) de-accumulated
        variable: Dataset variable
        ds_1_accumulated: Date -1 (e.g. 1970-01-01) accumulated.  Expected time and steps dimensions.
                          Needed either this or ds_1_deaccumulated
        ds_1_deaccumulated: Date -1 (e.g. 1970-01-01) de-accumulated. Expected time dimension. 
                            Needed either this or ds_1_accumulated
        
    Returns:
        Date 0 xarray.Dataset
    """
    ds0_accumulated = deepcopy(ds0_deaccumulated)
    
    if ds_1_accumulated:
        ds0_accumulated[variable][0] += ds_1_accumulated[variable].values[1][22]
    elif ds_1_deaccumulated:
        for i in range(1, 24):
            ds0_accumulated[variable][0] += ds_1_deaccumulated[variable].values[i]

    for step in range(2, 24):
        tmp = deepcopy(ds0_deaccumulated[variable].values[1])
        for acc_step in range(2, step+1):
            tmp += ds0_deaccumulated[variable].values[acc_step]
        ds0_accumulated[variable][step] = tmp
    
    return ds0_accumulated

ds_accumulated = accumulate(ds_deaccumulated, variable, ds_1_accumulated=ds_1)

Check correctness

#    00:00 
np.allclose(ds_accumulated.tp.values[0], ds0.tp.values[0][23], equal_nan=True)

# From
#    01:00
# to
#    23:00
for step in range(1, 24):
    print("Time", f"{step}:00")
    print(np.allclose(ds_accumulated.tp[step], ds0.tp.values[1][step-1], equal_nan=True))
    print()

Data organization and access

The ARCO data is stored as Zarr datacubes.

The data is now available from the Climate Data Store (CDS), either interactively through its download web form or programmatically using the CDS API service:

ERA5-Land hourly time-series data on single levels from 1950 to present.

Spatial grid

The atmospheric parameters have a grid resolution of 0.1 degrees.

Please be aware when you select a "Location" on the form, the latitude and longitude values are rounded to the closest neighbourhood point of the 0.1 degrees grid.

Temporal frequency

Temporal frequency of the data is hourly and every day has 24 hourly steps.

Data format

In the CDS, there are the options of retrieving the data in NetCDF format or CSV format.

Parameters are stored in groups to ensure a proper access performance. The number of NetCDF files downloaded will be equal to total number of groups of all the parameters selected. Users can check the groups in the Parameters listings section below.

The file naming convention is:

reanalysis-era5-land-timeseries-sfc-groupnamerandomcharacters.[nc|csv]

Data update frequency

ERA5-Land is updated on a daily basis. New daily data in ERA5-Land becomes available in ERA5-Land hourly time-series in a few hours.

The availability of the data is of 5 days behind the actual date, this means that on the 24/04 users can download data for that day - 5 days = 19/04.

Variations in delivery times may occur due to the non-operational nature of this CDS service, as issues may arise which cause delays.

Parameters listings

Table 1 below lists the surface and single level parameters (levtype=sfc).

Table 1: surface and single level parameters (instantaneous and accumulations)

Count	Name	Units	Variable name	Notes	Group
1	2 metre temperature	K	2m_temperature		2m temperature
2	2 metre dewpoint temperature	K	2m_dewpoint_temperature		2m temperature
3	Total precipitation	m	total_precipitation	De-accumulated	Pressure and precipitation
4	Surface pressure	Pa	surface_pressure		Pressure and precipitation
5	10 metre V wind component	m s**-1	10m_v_component_of_wind		Wind
6	10 metre U wind component	m s**-1	10m_u_component_of_wind		Wind
7	Surface solar radiation downwards	J m**-2	surface_solar_radiation_downwards	De-accumulated	Radiation and heat
8	Volumetric soil water layer 1	m3 m-3	volumetric_soil_water_layer_1		Soil water
9	Volumetric soil water layer 2	m3 m-3	volumetric_soil_water_layer_2		Soil water
10	Volumetric soil water layer 3	m3 m-3	volumetric_soil_water_layer_3		Soil water
11	Volumetric soil water layer 4	m3 m-3	volumetric_soil_water_layer_4		Soil water
12	Soil temperature level 1	K	soil_temperature_level_1		Soil temperature
13	Soil temperature level 2	K	soil_temperature_level_2		Soil temperature
14	Soil temperature level 3	K	soil_temperature_level_3		Soil temperature
15	Soil temperature level 4	K	soil_temperature_level_4		Soil temperature
16	Snow cover	%	snow_cover		Snow

Known issues

Please refer to the ERA5-Land documentation.

References

Further ERA5 references are available from the ECMWF website.

_{This document has been produced in the context of the Copernicus Climate Change Service (C3S).}

_{The activities leading to these results have been contracted by the European Centre for Medium-Range Weather Forecasts, operator of C3S on behalf of the European Union (Delegation Agreement signed on 11/11/2014 and Contribution Agreement signed on 22/07/2021). All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose.}

_{The users thereof use the information at their sole risk and liability. For the avoidance of all doubt , the European Commission and the European Centre for Medium - Range Weather Forecasts have no liability in respect of this document, which is merely representing the author's view.}