Important information on CEMS Data Access via CDS

ECMWF has implemented a new state-of-the-art data access infrastructure to host the Climate and Atmospheric Data Stores (CDS and ADS, respectively). All layers of the infrastructure are being modernised: the front-end web interface, the back-end software engine, and the underlying cloud infrastructure hosting the service and core data repositories.

As part of this development, a new data store for the Copernicus Emergency Management Service (CEMS) has been created. The CEMS Early Warning Data Store (EWDS) will host all the historical and forecast information for floods and forest fires at European and Global levels. Users are encouraged to migrate their accounts and scripts to the new EWDS Beta before 26 September 2024, when the system will become operational.

For more information, Please read: CEMS Early Warning Data Store (EWDS) coming soon!


Warning

Updates to the EWDS documentation are ongoing as the implementation takes place. Scripts and examples on this page are under review and may not be fully functional yet for the EWDS, so please be patient.

Please send your feedback and/or report any issue/bug you may have encountered while using the new CDS/ADS/EWDS infrastructure by raising an enquiry ticket through our dedicated  Support Portal (ECMWF login required) - make sure you select "Yes" to the question Is this request related to CDS-beta/ADS-beta/CEMS-EW-beta? on the Support Portal request form - this will help the Support team with triaging enquiries more efficiently.


There are many situations where a user is only interested in a subset of the dataset spatial domain.

For example, when comparing modelled river flow against observations, it is reasonable to be able to extract the time-series at those point coordinates rather than dealing with many GB of data. Similarly, when focusing on a specific catchment it is likely that you want to deal with only that part of the spatial domain.

In summary, there are two operations of data size reduction that are very popular on CEMS-Flood datasets, area cropping and time-series extraction.

There are two ways to perform those operations:

  • Remotely - Using the CDS API to perform the operation remotely on the CDS compute nodes and retrieve only the reduced data.
  • Locally - Using the CDS API to retrieve the entire data and perform the operation locally.

This section provides scripts for both cases and for both CEMS-Flood products, GloFAS and EFAS.

Set up a Python environment

If you have not done it yet, create a Python virtual environment.

Activate the conda environment and install the additional Python package https://corteva.github.io/rioxarray

conda install rioxarray


Prepare and retrieve data (for local processing)

For the following exercises on extracting time series on the local machine, we are going to use the latitude and longitude coordinates from a tiny subset of the GRDC dataset.

Copy the content of the code block into an empty file named "GRDC.csv", the file should reside in your working folder.

GRDC dataset
grdc_no,wmo_reg,sub_reg,river,station,country,lat,long,area,altitude
6118010,6,18,CLAIE,SAINT-JEAN-BREVELAY,FR,47.824831,-2.703308,137.0,99.98
6118015,6,18,YVEL,LOYAT (PONT D129),FR,47.993815,-2.368849,315.0,88.26
6118020,6,18,"ARON, RUISSEAU D'",GRAND-FOUGERAY (LA BERNADAISE),FR,47.71222,-1.690835,118.0,68.0
6118025,6,18,"CANUT, RUISSEAU DE",SAINT-JUST (LA RIVIERE COLOMBEL),FR,47.775309,-1.979609,37.0,80.22
6118030,6,18,AFF,PAIMPONT (PONT DU SECRET),FR,47.981631,-2.143762,30.2,119.01
6118050,6,18,COET-ORGAN,QUISTINIC (KERDEC),FR,47.904164,-3.201265,47.7,94.42
6118060,6,18,EVEL,GUENIN,FR,47.899928,-2.975167,316.0,95.16
6118070,6,18,STER-GOZ,BANNALEC (PONT MEYA),FR,47.906833,-3.752172,69.7,85.08
6118080,6,18,MOROS,CONCARNEAU (PONT D22),FR,47.882934,-3.875375,20

Then, retrieve the following datasets into the same working folder.

Get the GloFAS Historical data
import cdsapi

dataset = "cems-glofas-historical"
request = {
    "system_version": ["version_4_0"],
    "hydrological_model": ["lisflood"],
    "product_type": ["intermediate"],
    "variable": ["river_discharge_in_the_last_24_hours"],
    "hyear": ["2023"],
    "hmonth": ["06"],
    "hday": [
        "01","02","03",
        "04","05","06",
        "07","08","09",
        "10","11","12",
        "13","14","15",
        "16","17","18",
        "19","20","21",
        "22","23","24",
        "25","26","27",
        "28","29","30"
    ],
    "data_format": "grib2",
    "download_format": "unarchived"
}

client = cdsapi.Client()
client.retrieve(dataset, request).download('glofas_historical.grib')


Get the EFAS reforecast data
import cdsapi

dataset = "efas-reforecast"
request = {
    "system_version": ["version_5_0"],
    "product_type": ["ensemble_perturbed_forecasts"],
    "variable": ["river_discharge_in_the_last_6_hours"],
    "model_levels": "surface_level",
    "hyear": ["2007"],
    "hmonth": ["03"],
    "hday": [
        "27","30"
    ],
    "leadtime_hour": [
        "6","12","18","24"
    ],
    "data_format": "grib",
    "download_format": "unarchived"
}

client = cdsapi.Client()
client.retrieve(dataset, request).download('efas_reforecast.grib')



EFAS

Removal of subsetting for EFAS

An issue has been identified with the EFAS sub-region extraction tool, whereby it serves data that is not correctly located on the river network. The sub-region extraction tool has therefore been removed from the EFAS CDS entries, and any area specified in cdsapi requests will return the entire domain .

Data previously downloaded using this tool should be disregarded.

For more information please see EFAS-Known Issues

Coordinates precision

When transforming from lat/lon (source coordinates) to projected LAEA (target coordinates), you need to consider that the number of decimal places of the source coordinates affects the target coordinates precision:

An interval of 0.001 degrees corresponds to about 100 metres in LAEA.

An interval of 0.00001 degrees corresponds to about 1 metre in LAEA.

(We don't need projected (LAEA) for EFAS v5.0)

Remote processing

Time series extraction:

Extract timeseries: EFAS medium-range historical

 

Area cropping:

Area cropping: EFAS medium-range historical
## === retrieve EFAS Seasonal Forecast ===

## === subset Switzerland region ===

import cdsapi

if __name__ == '__main__':
    c = cdsapi.Client()

    DATASET = "efas-seasonal"

    # List of years from 2020 to 2021 (inclusive)
    YEARS = ['%d' % (y) for y in range(2020, 2022)]

    # List of months (formatted as two digits)
    MONTHS = ['%02d' % (m) for m in range(1, 13)]

    # Lead times in hours (steps of 24, up to 5160)
    LEADTIMES = ['%d' % (l) for l in range(24, 2976, 24)]

    # Iterate over years and months
    for year in YEARS:
        for month in MONTHS:
            print(f'Year={year}')
            print(f'Month={month}')
            # Modify the request based on the year
            REQUEST = {
                "variable": ["river_discharge_in_the_last_24_hours"],
                "model_levels": "surface_level",
                "year": year,
                # Only use December for 2020, all months for other years
                "month": "12" if year == "2020" else month,
                "leadtime_hour": LEADTIMES,
                "data_format": "grib",
                "download_format": "unarchived",  # Use "zip" if unarchived data is intended
                # Switzerland bounding box: [north, west, south, east]
				"area": [47.9, 5.8, 45.7, 10.6]
            }
            # Download the data and save as a .grib file
            c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{year}_{month}.grib')

Local processing

Time series extraction:


Extract timeseries: EFAS medium-range Reforecast
import xarray as xr
import pandas as pd
from pyproj import Transformer,CRS

parameter = "dis06"

ds = xr.open_dataset("efas-seasonal.grib", engine="cfgrib",filter_by_keys={'dataType': 'cf'})
df = pd.read_csv("grdc.csv", sep=",")

total = len(df)
rows = []
count = 0
for lon1, lat1, id in zip(df['long'], df['lat'], df['grdc_no']):
    extracted = ds.sel(longitude=lon1, latitude=lat1,method="nearest")[parameter]
    df_temp = extracted.drop_vars(["surface", "number"]).to_dataframe().reset_index()
    df_temp["grdc"] = str(id)
    df_temp = df_temp.drop("step", axis=1)
    df_temp = df_temp.set_index(["grdc","time"])
    rows.append(df_temp)
    count += 1
    df_temp.to_csv(f"station_{id}.csv")
    print(f"progress: {count/total*100} %")

Area cropping:

Area cropping: EFAS medium-range Reforecast
import xarray as xr
import rioxarray as rio

# Rhine's basin bounding box coordinates in WGS84 (no reprojection needed)
coords = [5.450796, 11.871059, 46.296530, 50.972204]  # W, E, S, N

# read EFAS reforecast dataset
ds = xr.open_dataset("efas_reforecast.grib", engine="cfgrib",filter_by_keys={'dataType': 'cf'})
# add reference system to thw dataset
ds = ds.rio.write_crs("EPSG:4326")

# Function to convert 4 coordinates into a Polygon (W, E, S, N)
def bbox_from_wesn(coords):
    w, e, s, n = coords
    bbox = [{
        'type': 'Polygon',
        'coordinates': [[
            (w, n), (w, s), (e, s), (e, n), (w, n)
        ]]
    }]
    return bbox

# Create the bounding box polygon
bbox = bbox_from_wesn(coords)
# Clip the dataset using the bounding box
ds_clipped = ds.rio.clip(bbox)

# Save the clipped dataset as a new NetCDF file
ds_clipped.to_netcdf("efas_reforecast_rhine.nc")

GloFAS

Remote processing

Time series extraction:

Extract timeseries: GloFAS Medium-range Reforecast
import cdsapi
from datetime import datetime, timedelta

def get_monthsdays():
    start, end = datetime(2024, 1, 1), datetime(2024,1, 31) # reference year 2024
    days = [start + timedelta(days=i) for i in range((end - start).days + 1)]
    monthday = [d.strftime("%m-%d").split("-") for d in days if d.weekday() in [0, 3]]
    return monthday

if __name__ == '__main__':
    c = cdsapi.Client()
    
    # station coordinates (lat,lon)
    COORDS = {
    "Thames":[51.35,-0.45]
    }
    adjustment = 0.05
    bbox = [
        COORDS["Thames"][0] + adjustment,  # North (Latitude + adjustment)
        COORDS["Thames"][1] - adjustment,  # West (Longitude - adjustment)
        COORDS["Thames"][0] - adjustment,  # South (Latitude - adjustment)
        COORDS["Thames"][1] + adjustment   # East (Longitude + adjustment)
    ]
    DATASET='cems-glofas-reforecast'
    # select date index corresponding to the event 
    MONTHSDAYS = get_monthsdays()
    
    YEAR = '2022' 
    
    LEADTIMES = ['%d'%(l) for l in range(24,1104,24)]
    
    # loop over date index (just 1 in this case)
    for md in MONTHSDAYS:
        
        month = md[0].lower()
        day = md[1]
        
        # loop over station coordinates
        for station in COORDS:
            station_point_coord = COORDS[station]*2 # coordinates input for the area keyword
            REQUEST= {
                            'system_version': ["version_4_0"],
                            'variable': 'river_discharge_in_the_last_24_hours',
                            'hydrological_model': 'lisflood',
                            'product_type': 'control_reforecast',
                            'area': bbox,
                            'hyear': YEAR,
                            'hmonth': month,
                            'hday': day,
                            'leadtime_hour': LEADTIMES,
                            'data_format': "grib2",
                            'download_format': "zip"
                             }
            c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{month}_{day}.zip')

Area cropping:

Area cropping: GloFAS Medium-range reforecast
## === retrieve GloFAS Medium-Range Reforecast ===

## === subset India, Pakistan, Nepal and Bangladesh region === 


import cdsapi
from datetime import datetime, timedelta


def get_monthsdays():
    start, end = datetime(2024, 1, 1), datetime(2024,1, 31) # reference year 2024
    days = [start + timedelta(days=i) for i in range((end - start).days + 1)]
    monthday = [d.strftime("%m-%d").split("-") for d in days if d.weekday() in [0, 3]]
    return monthday

MONTHSDAYS = get_monthsdays()

if __name__ == '__main__':
    c = cdsapi.Client()

    DATASET='cems-glofas-reforecast'
    # user inputs
    BBOX = [40.05,59.95,4.95,95.05] # North West South East
    YEARS  = ['%d'%(y) for y in range(2004,2023)]
    LEADTIMES = ['%d'%(l) for l in range(24,1128,24)]
    
    # submit request
    for md in MONTHSDAYS:

        month = md[0].lower()
        day = md[1]
        REQUEST= {
                            'system_version': ["version_4_0"],
                            'variable': 'river_discharge_in_the_last_24_hours',
                            'hydrological_model': 'lisflood',
                            'product_type': 'control_reforecast',
                            'area': BBOX,
                            'hyear': YEARS,
                            'hmonth': month,
                            'hday': day,
                            'leadtime_hour': LEADTIMES,
                            'data_format': "grib2",
                            'download_format': "zip"
                             }
        c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{month}_{day}.zip')
Area cropping: GloFAS Seasonal Forecast
## === retrieve GloFAS Seasonal Forecast ===

## === subset South America/Amazon region === 

import cdsapi


if __name__ == '__main__':
    c = cdsapi.Client()
    DATASET='cems-glofas-seasonal'
    YEARS  = ['%d'%(y) for y in range(2020,2022)]
    MONTHS = ['%02d'%(m) for m in range(1,13)]

    LEADTIMES = ['%d'%(l) for l in range(24,2976,24)]
    
    for year in YEARS:
        for month in MONTHS:
            
                REQUEST={   
                    'variable': 'river_discharge_in_the_last_24_hours',
                    'year': year,
                    'month': '12' if year == '2020' else month,
                    'leadtime_hour': LEADTIMES,
    				'area': [ 10.95, -90.95, -30.95, -29.95 ],
                    'data_format': "grib2",
                    'download_format': "zip"
                         }
                c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{year}_{month}.zip')
Area cropping: GloFAS Seasonal Reforecast
## === retrieve GloFAS Seasonal Reforecast ===

## === subset South America/Amazon region === 

import cdsapi

if __name__ == '__main__':


    c = cdsapi.Client()

    YEARS  = ['%d'%(y) for y in range(1981,2021)]

    MONTHS = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']

    LEADTIMES = ['%d'%(l) for l in range(24,2976,24)]
    
    for year in YEARS:
        for month in MONTHS:

            c.retrieve(
                'cems-glofas-seasonal-reforecast',
                {
                    'system_version': 'version_2_2',
                    'variable':'river_discharge_in_the_last_24_hours',
                    'format':'grib',
                    'hydrological_model':'htessel_lisflood',
                    'hyear': year,
                    'hmonth': month,
                    'leadtime_hour': LEADTIMES,
                    'area': [ 10.95, -90.95, -30.95, -29.95 ]
                }, 
                f'glofas_seasonal_reforecast_{year}_{month}.grib')

Local processing

Time series extraction:

Extract timeseries: GloFAS historical
import xarray as xr
import pandas as pd

parameter = "dis24"

ds = xr.open_dataset("glofas_historical.grib", engine="cfgrib",backend_kwargs={'time_dims':['time']})
df = pd.read_csv("grdc.csv")

total = len(df)

rows = []
count = 0
for lon, lat, id in zip(df.long, df.lat, df.grdc_no):
    extracted = ds.sel(longitude=lon, latitude=lat, method="nearest")[parameter]
    df_temp = extracted.drop_vars(["surface"]).to_dataframe().reset_index()
    df_temp["grdc"] = str(id)
    df_temp = df_temp.set_index(["grdc", "time"])
    rows.append(df_temp)
    count += 1
    print(f"progress: {count/total*100} %")

out = pd.concat(rows)
out.to_csv("extracted.csv", index="grdc")

Area cropping:

Area cropping: GloFAS historical
import xarray as xr
 
# Rhine's basin bounding box
bbox = [50.972204, 46.296530, 5.450796, 11.871059]  # N,S,W,E
 
ds = xr.open_dataset("glofas_historical.grib", engine="cfgrib")
 
ds_cropped = ds.sel(
    longitude=slice(bbox[2], bbox[3]), latitude=slice(bbox[0], bbox[1])
)
 
ds_cropped.to_netcdf("glofas_historical_cropped.nc")