ECMWF has implemented a new state-of-the-art data access infrastructure to host the Climate and Atmospheric Data Stores (CDS and ADS, respectively). All layers of the infrastructure are being modernised: the front-end web interface, the back-end software engine, and the underlying cloud infrastructure hosting the service and core data repositories. As part of this development, a new data store for the Copernicus Emergency Management Service (CEMS) has been created. The CEMS Early Warning Data Store (EWDS) will host all the historical and forecast information for floods and forest fires at European and Global levels. Users are encouraged to migrate their accounts and scripts to the new EWDS Beta before 26 September 2024, when the system will become operational. For more information, Please read: CEMS Early Warning Data Store (EWDS) coming soon! |
Updates to the EWDS documentation are ongoing as the implementation takes place. Scripts and examples on this page are under review and may not be fully functional yet for the EWDS, so please be patient. Please send your feedback and/or report any issue/bug you may have encountered while using the new CDS/ADS/EWDS infrastructure by raising an enquiry ticket through our dedicated Support Portal (ECMWF login required) - make sure you select "Yes" to the question Is this request related to CDS-beta/ADS-beta/CEMS-EW-beta? on the Support Portal request form - this will help the Support team with triaging enquiries more efficiently. |
There are many situations where a user is only interested in a subset of the dataset spatial domain.
For example, when comparing modelled river flow against observations, it is reasonable to be able to extract the time-series at those point coordinates rather than dealing with many GB of data. Similarly, when focusing on a specific catchment it is likely that you want to deal with only that part of the spatial domain.
In summary, there are two operations of data size reduction that are very popular on CEMS-Flood datasets, area cropping and time-series extraction.
There are two ways to perform those operations:
This section provides scripts for both cases and for both CEMS-Flood products, GloFAS and EFAS.
If you have not done it yet, create a Python virtual environment.
Activate the conda environment and install the additional Python package https://corteva.github.io/rioxarray
conda install rioxarray |
For the following exercises on extracting time series on the local machine, we are going to use the latitude and longitude coordinates from a tiny subset of the GRDC dataset.
Copy the content of the code block into an empty file named "GRDC.csv", the file should reside in your working folder.
grdc_no,wmo_reg,sub_reg,river,station,country,lat,long,area,altitude 6118010,6,18,CLAIE,SAINT-JEAN-BREVELAY,FR,47.824831,-2.703308,137.0,99.98 6118015,6,18,YVEL,LOYAT (PONT D129),FR,47.993815,-2.368849,315.0,88.26 6118020,6,18,"ARON, RUISSEAU D'",GRAND-FOUGERAY (LA BERNADAISE),FR,47.71222,-1.690835,118.0,68.0 6118025,6,18,"CANUT, RUISSEAU DE",SAINT-JUST (LA RIVIERE COLOMBEL),FR,47.775309,-1.979609,37.0,80.22 6118030,6,18,AFF,PAIMPONT (PONT DU SECRET),FR,47.981631,-2.143762,30.2,119.01 6118050,6,18,COET-ORGAN,QUISTINIC (KERDEC),FR,47.904164,-3.201265,47.7,94.42 6118060,6,18,EVEL,GUENIN,FR,47.899928,-2.975167,316.0,95.16 6118070,6,18,STER-GOZ,BANNALEC (PONT MEYA),FR,47.906833,-3.752172,69.7,85.08 6118080,6,18,MOROS,CONCARNEAU (PONT D22),FR,47.882934,-3.875375,20 |
Then, retrieve the following datasets into the same working folder.
import cdsapi dataset = "cems-glofas-historical" request = { "system_version": ["version_4_0"], "hydrological_model": ["lisflood"], "product_type": ["intermediate"], "variable": ["river_discharge_in_the_last_24_hours"], "hyear": ["2023"], "hmonth": ["06"], "hday": [ "01","02","03", "04","05","06", "07","08","09", "10","11","12", "13","14","15", "16","17","18", "19","20","21", "22","23","24", "25","26","27", "28","29","30" ], "data_format": "grib2", "download_format": "unarchived" } client = cdsapi.Client() client.retrieve(dataset, request).download('glofas_historical.grib') |
import cdsapi dataset = "efas-reforecast" request = { "system_version": ["version_5_0"], "product_type": ["ensemble_perturbed_forecasts"], "variable": ["river_discharge_in_the_last_6_hours"], "model_levels": "surface_level", "hyear": ["2007"], "hmonth": ["03"], "hday": [ "27","30" ], "leadtime_hour": [ "6","12","18","24" ], "data_format": "grib", "download_format": "unarchived" } client = cdsapi.Client() client.retrieve(dataset, request).download('efas_reforecast.grib') |
An issue has been identified with the EFAS sub-region extraction tool, whereby it serves data that is not correctly located on the river network. The sub-region extraction tool has therefore been removed from the EFAS CDS entries, and any area specified in cdsapi requests will return the entire domain . Data previously downloaded using this tool should be disregarded. For more information please see EFAS-Known Issues |
When transforming from lat/lon (source coordinates) to projected LAEA (target coordinates), you need to consider that the number of decimal places of the source coordinates affects the target coordinates precision: An interval of 0.001 degrees corresponds to about 100 metres in LAEA. An interval of 0.00001 degrees corresponds to about 1 metre in LAEA. (We don't need projected (LAEA) for EFAS v5.0) |
## === retrieve EFAS Seasonal Forecast === ## === subset Switzerland region === import cdsapi if __name__ == '__main__': c = cdsapi.Client() DATASET = "efas-seasonal" # List of years from 2020 to 2021 (inclusive) YEARS = ['%d' % (y) for y in range(2020, 2022)] # List of months (formatted as two digits) MONTHS = ['%02d' % (m) for m in range(1, 13)] # Lead times in hours (steps of 24, up to 5160) LEADTIMES = ['%d' % (l) for l in range(24, 2976, 24)] # Iterate over years and months for year in YEARS: for month in MONTHS: print(f'Year={year}') print(f'Month={month}') # Modify the request based on the year REQUEST = { "variable": ["river_discharge_in_the_last_24_hours"], "model_levels": "surface_level", "year": year, # Only use December for 2020, all months for other years "month": "12" if year == "2020" else month, "leadtime_hour": LEADTIMES, "data_format": "grib", "download_format": "unarchived", # Use "zip" if unarchived data is intended # Switzerland bounding box: [north, west, south, east] "area": [47.9, 5.8, 45.7, 10.6] } # Download the data and save as a .grib file c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{year}_{month}.grib') |
import xarray as xr import pandas as pd from pyproj import Transformer,CRS parameter = "dis06" ds = xr.open_dataset("efas-seasonal.grib", engine="cfgrib",filter_by_keys={'dataType': 'cf'}) df = pd.read_csv("grdc.csv", sep=",") total = len(df) rows = [] count = 0 for lon1, lat1, id in zip(df['long'], df['lat'], df['grdc_no']): extracted = ds.sel(longitude=lon1, latitude=lat1,method="nearest")[parameter] df_temp = extracted.drop_vars(["surface", "number"]).to_dataframe().reset_index() df_temp["grdc"] = str(id) df_temp = df_temp.drop("step", axis=1) df_temp = df_temp.set_index(["grdc","time"]) rows.append(df_temp) count += 1 df_temp.to_csv(f"station_{id}.csv") print(f"progress: {count/total*100} %") |
import xarray as xr import rioxarray as rio # Rhine's basin bounding box coordinates in WGS84 (no reprojection needed) coords = [5.450796, 11.871059, 46.296530, 50.972204] # W, E, S, N # read EFAS reforecast dataset ds = xr.open_dataset("efas_reforecast.grib", engine="cfgrib",filter_by_keys={'dataType': 'cf'}) # add reference system to thw dataset ds = ds.rio.write_crs("EPSG:4326") # Function to convert 4 coordinates into a Polygon (W, E, S, N) def bbox_from_wesn(coords): w, e, s, n = coords bbox = [{ 'type': 'Polygon', 'coordinates': [[ (w, n), (w, s), (e, s), (e, n), (w, n) ]] }] return bbox # Create the bounding box polygon bbox = bbox_from_wesn(coords) # Clip the dataset using the bounding box ds_clipped = ds.rio.clip(bbox) # Save the clipped dataset as a new NetCDF file ds_clipped.to_netcdf("efas_reforecast_rhine.nc") |
import cdsapi from datetime import datetime, timedelta def get_monthsdays(): start, end = datetime(2024, 1, 1), datetime(2024,1, 31) # reference year 2024 days = [start + timedelta(days=i) for i in range((end - start).days + 1)] monthday = [d.strftime("%m-%d").split("-") for d in days if d.weekday() in [0, 3]] return monthday if __name__ == '__main__': c = cdsapi.Client() # station coordinates (lat,lon) COORDS = { "Thames":[51.35,-0.45] } adjustment = 0.05 bbox = [ COORDS["Thames"][0] + adjustment, # North (Latitude + adjustment) COORDS["Thames"][1] - adjustment, # West (Longitude - adjustment) COORDS["Thames"][0] - adjustment, # South (Latitude - adjustment) COORDS["Thames"][1] + adjustment # East (Longitude + adjustment) ] DATASET='cems-glofas-reforecast' # select date index corresponding to the event MONTHSDAYS = get_monthsdays() YEAR = '2022' LEADTIMES = ['%d'%(l) for l in range(24,1104,24)] # loop over date index (just 1 in this case) for md in MONTHSDAYS: month = md[0].lower() day = md[1] # loop over station coordinates for station in COORDS: station_point_coord = COORDS[station]*2 # coordinates input for the area keyword REQUEST= { 'system_version': ["version_4_0"], 'variable': 'river_discharge_in_the_last_24_hours', 'hydrological_model': 'lisflood', 'product_type': 'control_reforecast', 'area': bbox, 'hyear': YEAR, 'hmonth': month, 'hday': day, 'leadtime_hour': LEADTIMES, 'data_format': "grib2", 'download_format': "zip" } c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{month}_{day}.zip') |
## === retrieve GloFAS Medium-Range Reforecast === ## === subset India, Pakistan, Nepal and Bangladesh region === import cdsapi from datetime import datetime, timedelta def get_monthsdays(): start, end = datetime(2024, 1, 1), datetime(2024,1, 31) # reference year 2024 days = [start + timedelta(days=i) for i in range((end - start).days + 1)] monthday = [d.strftime("%m-%d").split("-") for d in days if d.weekday() in [0, 3]] return monthday MONTHSDAYS = get_monthsdays() if __name__ == '__main__': c = cdsapi.Client() DATASET='cems-glofas-reforecast' # user inputs BBOX = [40.05,59.95,4.95,95.05] # North West South East YEARS = ['%d'%(y) for y in range(2004,2023)] LEADTIMES = ['%d'%(l) for l in range(24,1128,24)] # submit request for md in MONTHSDAYS: month = md[0].lower() day = md[1] REQUEST= { 'system_version': ["version_4_0"], 'variable': 'river_discharge_in_the_last_24_hours', 'hydrological_model': 'lisflood', 'product_type': 'control_reforecast', 'area': BBOX, 'hyear': YEARS, 'hmonth': month, 'hday': day, 'leadtime_hour': LEADTIMES, 'data_format': "grib2", 'download_format': "zip" } c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{month}_{day}.zip') |
## === retrieve GloFAS Seasonal Forecast === ## === subset South America/Amazon region === import cdsapi if __name__ == '__main__': c = cdsapi.Client() DATASET='cems-glofas-seasonal' YEARS = ['%d'%(y) for y in range(2020,2022)] MONTHS = ['%02d'%(m) for m in range(1,13)] LEADTIMES = ['%d'%(l) for l in range(24,2976,24)] for year in YEARS: for month in MONTHS: REQUEST={ 'variable': 'river_discharge_in_the_last_24_hours', 'year': year, 'month': '12' if year == '2020' else month, 'leadtime_hour': LEADTIMES, 'area': [ 10.95, -90.95, -30.95, -29.95 ], 'data_format': "grib2", 'download_format': "zip" } c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{year}_{month}.zip') |
## === retrieve GloFAS Seasonal Reforecast === ## === subset South America/Amazon region === import cdsapi if __name__ == '__main__': c = cdsapi.Client() YEARS = ['%d'%(y) for y in range(1981,2021)] MONTHS = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december'] LEADTIMES = ['%d'%(l) for l in range(24,2976,24)] for year in YEARS: for month in MONTHS: c.retrieve( 'cems-glofas-seasonal-reforecast', { 'system_version': 'version_2_2', 'variable':'river_discharge_in_the_last_24_hours', 'format':'grib', 'hydrological_model':'htessel_lisflood', 'hyear': year, 'hmonth': month, 'leadtime_hour': LEADTIMES, 'area': [ 10.95, -90.95, -30.95, -29.95 ] }, f'glofas_seasonal_reforecast_{year}_{month}.grib') |
import xarray as xr import pandas as pd parameter = "dis24" ds = xr.open_dataset("glofas_historical.grib", engine="cfgrib",backend_kwargs={'time_dims':['time']}) df = pd.read_csv("grdc.csv") total = len(df) rows = [] count = 0 for lon, lat, id in zip(df.long, df.lat, df.grdc_no): extracted = ds.sel(longitude=lon, latitude=lat, method="nearest")[parameter] df_temp = extracted.drop_vars(["surface"]).to_dataframe().reset_index() df_temp["grdc"] = str(id) df_temp = df_temp.set_index(["grdc", "time"]) rows.append(df_temp) count += 1 print(f"progress: {count/total*100} %") out = pd.concat(rows) out.to_csv("extracted.csv", index="grdc") |
import xarray as xr # Rhine's basin bounding box bbox = [50.972204, 46.296530, 5.450796, 11.871059] # N,S,W,E ds = xr.open_dataset("glofas_historical.grib", engine="cfgrib") ds_cropped = ds.sel( longitude=slice(bbox[2], bbox[3]), latitude=slice(bbox[0], bbox[1]) ) ds_cropped.to_netcdf("glofas_historical_cropped.nc") |