Message Box | ||||
---|---|---|---|---|
| ||||
ECMWF has implemented a new state-of-the-art data access infrastructure to host the Climate and Atmospheric Data Stores (CDS and ADS, respectively). All layers of the infrastructure are being modernised: the front-end web interface, the back-end software engine, and the underlying cloud infrastructure hosting the service and core data repositories. As part of this development, a new data store for the Copernicus Emergency Management Service (CEMS) has been created. The CEMS Early Warning Data Store (EWDS) will host all the historical and forecast information for floods and forest fires at European and Global levels. Users are encouraged to migrate their accounts and scripts to the new EWDS Beta before 26 September 2024, when the system will become operational. For more information, Please read: CEMS Early Warning Data Store (EWDS) coming soon! |
Message Box | ||||
---|---|---|---|---|
| ||||
Updates to the EWDS documentation are ongoing as the implementation takes place. Scripts and examples on this page are under review and may not be fully functional yet for the EWDS, so please be patient. Please send your feedback and/or report any issue/bug you may have encountered while using the new CDS/ADS/EWDS infrastructure by raising an enquiry ticket through our dedicated Support Portal (ECMWF login required) - make sure you select "Yes" to the question Is this request related to CDS-beta/ADS-beta/CEMS-EW-beta? on the Support Portal request form - this will help the Support team with triaging enquiries more efficiently. |
There are many situations where a user is only interested in a subset of the dataset spatial domain.
For example, when comparing modelled river flow against observations, it is reasonable to be able to extract the time-series at those point coordinates rather than dealing with many GB of data. Similarly, when focusing on a specific catchment it is likely that you want to deal with only that part of the spatial domain.
In summary, there are two operations of data size reduction that are very popular on CEMS-Flood datasets, area cropping and time-series extraction.
There are two ways to perform those operations:
- Remotely - Using the CDS API to perform the operation remotely on the CDS compute nodes and retrieve only the reduced data.
- Locally - Using the CDS API to retrieve the entire data and perform the operation locally.
This section provides scripts for both cases and for both CEMS-Flood products, GloFAS and EFAS.
Set up a Python environment
If you have not done it yet, create a Python virtual environment.
Activate the conda environment and install the additional Python package https://corteva.github.io/rioxarray
Code Block | ||||
---|---|---|---|---|
| ||||
conda install rioxarray |
Prepare and retrieve data (for local processing)
For the following exercises on extracting time series on the local machine, we are going to use the latitude and longitude coordinates from a tiny subset of the GRDC dataset.
Copy the content of the code block into an empty file named "GRDC.csv", the file should reside in your working folder.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
grdc_no,wmo_reg,sub_reg,river,station,country,lat,long,area,altitude 6118010,6,18,CLAIE,SAINT-JEAN-BREVELAY,FR,47.824831,-2.703308,137.0,99.98 6118015,6,18,YVEL,LOYAT (PONT D129),FR,47.993815,-2.368849,315.0,88.26 6118020,6,18,"ARON, RUISSEAU D'",GRAND-FOUGERAY (LA BERNADAISE),FR,47.71222,-1.690835,118.0,68.0 6118025,6,18,"CANUT, RUISSEAU DE",SAINT-JUST (LA RIVIERE COLOMBEL),FR,47.775309,-1.979609,37.0,80.22 6118030,6,18,AFF,PAIMPONT (PONT DU SECRET),FR,47.981631,-2.143762,30.2,119.01 6118050,6,18,COET-ORGAN,QUISTINIC (KERDEC),FR,47.904164,-3.201265,47.7,94.42 6118060,6,18,EVEL,GUENIN,FR,47.899928,-2.975167,316.0,95.16 6118070,6,18,STER-GOZ,BANNALEC (PONT MEYA),FR,47.906833,-3.752172,69.7,85.08 6118080,6,18,MOROS,CONCARNEAU (PONT D22),FR,47.882934,-3.875375,20 |
Then, retrieve the following datasets into the same working folder.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import cdsapi dataset = "cems-glofas-historical" request = { "system_version": ["version_4_0"], "hydrological_model": ["lisflood"], "product_type": ["intermediate"], "variable": ["river_discharge_in_the_last_24_hours"], "hyear": ["2023"], "hmonth": ["06"], "hday": [ "01","02","03", "04","05","06", "07","08","09", "10","11","12", "13","14","15", "16","17","18", "19","20","21", "22","23","24", "25","26","27", "28","29","30" ], "data_format": "grib2", "download_format": "unarchived" } client = cdsapi.Client() client.retrieve(dataset, request).download('glofas_historical.grib') |
Code Block | ||||
---|---|---|---|---|
| ||||
import cdsapi dataset = "efas-reforecast" request = { "system_version": ["version_5_0"], "product_type": ["ensemble_perturbed_forecasts"], "variable": ["river_discharge_in_the_last_6_hours"], "model_levels": "surface_level", "hyear": ["2007"], "hmonth": ["03"], "hday": [ "27","30" ], "leadtime_hour": [ "6","12","18","24" ], "data_format": "grib", "download_format": "unarchived" } client = cdsapi.Client() client.retrieve(dataset, request).download('efas_reforecast.grib') |
EFAS
Warning | ||
---|---|---|
| ||
An issue has been identified with the EFAS sub-region extraction tool, whereby it serves data that is not correctly located on the river network. The sub-region extraction tool has therefore been removed from the EFAS CDS entries, and any area specified in cdsapi requests will return the entire domain . Data previously downloaded using this tool should be disregarded. For more information please see EFAS-Known Issues |
Warning | ||
---|---|---|
| ||
When transforming from lat/lon (source coordinates) to projected LAEA (target coordinates), you need to consider that the number of decimal places of the source coordinates affects the target coordinates precision: An interval of 0.001 degrees corresponds to about 100 metres in LAEA. An interval of 0.00001 degrees corresponds to about 1 metre in LAEA. (We don't need projected (LAEA) for EFAS v5.0) |
Remote processing
Time series extraction:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Area cropping:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
## === retrieve EFAS Seasonal Forecast === ## === subset Switzerland region === import cdsapi if __name__ == '__main__': c = cdsapi.Client() DATASET = "efas-seasonal" # List of years from 2020 to 2021 (inclusive) YEARS = ['%d' % (y) for y in range(2020, 2022)] # List of months (formatted as two digits) MONTHS = ['%02d' % (m) for m in range(1, 13)] # Lead times in hours (steps of 24, up to 5160) LEADTIMES = ['%d' % (l) for l in range(24, 2976, 24)] # Iterate over years and months for year in YEARS: for month in MONTHS: print(f'Year={year}') print(f'Month={month}') # Modify the request based on the year REQUEST = { "variable": ["river_discharge_in_the_last_24_hours"], "model_levels": "surface_level", "year": year, # Only use December for 2020, all months for other years "month": "12" if year == "2020" else month, "leadtime_hour": LEADTIMES, "data_format": "grib", "download_format": "unarchived", # Use "zip" if unarchived data is intended # Switzerland bounding box: [north, west, south, east] "area": [47.9, 5.8, 45.7, 10.6] } # Download the data and save as a .grib file c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{year}_{month}.grib') |
Local processing
Time series extraction:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import xarray as xr import pandas as pd from pyproj import Transformer,CRS parameter = "dis06" ds = xr.open_dataset("1ed7098facfbe0a2ccc1c274bcadb4afefas-seasonal.grib", engine="cfgrib",filter_by_keys={'dataType': 'cf'}) df = pd.read_csv("grdc.csv", sep=",") total = len(df) rows = [] count = 0 for lon1, lat1, id in zip(df['long'], df['lat'], df['grdc_no']): extracted = ds.sel(longitude=lon1, latitude=lat1,method="nearest")[parameter] df_temp = extracted.drop_vars(["surface", "number"]).to_dataframe().reset_index() df_temp["grdc"] = str(id) df_temp = df_temp.drop("step", axis=1) df_temp = df_temp.set_index(["grdc","time"]) rows.append(df_temp) count += 1 df_temp.to_csv(f"station_{id}.csv") print(f"progress: {count/total*100} %") |
Area cropping:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import xarray as xr import rioxarray as rio # Rhine's basin bounding box coordinates in WGS84 (no reprojection needed) coords = [5.450796, 11.871059, 46.296530, 50.972204] # W, E, S, N # read EFAS reforecast dataset ds = xr.open_dataset("efas_reforecast.grib", engine="cfgrib",filter_by_keys={'dataType': 'cf'}) #ds = xr.open_dataset("1ed7098facfbe0a2ccc1c274bcadb4af.grib", engine="cfgrib",filter_by_keys={'dataType': 'cf'}) # add reference system to thw dataset ds = ds.rio.write_crs("EPSG:4326") # Function to convert 4 coordinates into a Polygon (W, E, S, N) def bbox_from_wesn(coords): w, e, s, n = coords bbox = [{ 'type': 'Polygon', 'coordinates': [[ (w, n), (w, s), (e, s), (e, n), (w, n) ]] }] return bbox # Create the bounding box polygon bbox = bbox_from_wesn(coords) # Clip the dataset using the bounding box ds_clipped = ds.rio.clip(bbox) # Save the clipped dataset as a new NetCDF file ds_clipped.to_netcdf("efas_reforecast_rhine.nc") |
GloFAS
Remote processing
Time series extraction:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import cdsapi from datetime import datetime, timedelta def get_monthsdays(): start, =[2019,end = datetime(2024, 1, 1],end=[2019,12,31]): # reforecast time index start, end = datetime(*start),datetime(*end) ), datetime(2024,1, 31) # reference year 2024 days = [start + timedelta(days=i) for i in range((end - start).days + 1)] monthday = [d.strftime("%B%m-%d").split("-") for d in days if d.weekday() in [0, 3]] ] return monthday if __name__ == '__main__': c = cdsapi.Client() # station coordinates (lat,lon) COORDS = { "Thames":[51.35,-0.45] } # select date index correspondingadjustment to the event MONTHSDAYS = get_monthsdays(start =[2019,7,11],end=[2019,7,11]) YEAR = '2007' LEADTIMES = ['%d'%(l) for l in range(24,1128,24)] # loop over date index (just 1 in this case) for md in MONTHSDAYS: month = md[0].lower() day = md[1] # loop over station coordinates for station in COORDS: station_point_coord = COORDS[station]*2 # coordinates input for the area keyword c.retrieve( = 0.05 bbox = [ COORDS["Thames"][0] + adjustment, # North (Latitude + adjustment) COORDS["Thames"][1] - adjustment, # West (Longitude - adjustment) COORDS["Thames"][0] - adjustment, # South (Latitude - adjustment) COORDS["Thames"][1] + adjustment # East (Longitude + adjustment) ] DATASET='cems-glofas-reforecast', { 'system_version': 'version_2_2', 'variable': 'river_discharge_in_the_last_24_hours', 'format': 'grib', 'hydrological_model': 'htessel_lisflood', 'product_type': ['control_reforecast','ensemble_perturbed_reforecasts'], 'area':station_point_coord, 'hyear': YEAR, 'hmonth': month , 'hday': day , 'leadtime_hour': LEADTIMES, }, f'glofas_reforecast_{station}_{month}_{day}.grib') |
Area cropping:
# select date index corresponding to the event
MONTHSDAYS = get_monthsdays()
YEAR = '2022'
LEADTIMES = ['%d'%(l) for l in range(24,1104,24)]
# loop over date index (just 1 in this case)
for md in MONTHSDAYS:
month = md[0].lower()
day = md[1]
# loop over station coordinates
for station in COORDS:
station_point_coord = COORDS[station]*2 # coordinates input for the area keyword
REQUEST= {
'system_version': ["version_4_0"],
'variable': 'river_discharge_in_the_last_24_hours',
'hydrological_model': 'lisflood',
'product_type': 'control_reforecast',
'area': bbox,
'hyear': YEAR,
'hmonth': month,
'hday': day,
'leadtime_hour': LEADTIMES,
'data_format': "grib2",
'download_format': "zip"
}
c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{month}_{day}.zip') |
Area cropping:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
## === retrieve GloFAS Medium-Range Reforecast ===
## === subset India, Pakistan, Nepal and Bangladesh region ===
import cdsapi
from datetime import datetime, timedelta
def get_monthsdays():
start, end = datetime(2024, 1, 1), datetime(2024,1, 31) # reference year 2024
days = [start + timedelta(days=i) for i in range((end - start).days + 1)]
monthday = [d.strftime("%m-%d").split("-") for d in days if d.weekday() in [0, 3]]
return monthday
MONTHSDAYS = get_monthsdays()
if __name__ == '__main__':
c = cdsapi.Client()
DATASET='cems-glofas-reforecast'
# user inputs
BBOX = [40.05,59.95,4.95,95.05] # North West South East
YEARS = ['%d'%(y) for y in range(2004,2023)]
LEADTIMES = ['%d'%(l) for l in range(24,1128,24)]
# submit request
for md in MONTHSDAYS:
month = md[0].lower()
day = md[1]
REQUEST= {
'system_version': ["version_4_0"],
'variable': 'river_discharge_in_the_last_24_hours',
'hydrological_model': 'lisflood',
'product_type': 'control_reforecast',
'area': BBOX,
| ||||||
Code Block | ||||||
| ||||||
## === retrieve GloFAS Medium-Range Reforecast === ## === subset India, Pakistan, Nepal and Bangladesh region === import cdsapi from datetime import datetime, timedelta def get_monthsdays(): start, end = datetime(2019, 1, 1), datetime(2019, 12, 31) days = [start + timedelta(days=i) for i in range((end - start).days + 1)] monthday = [d.strftime("%B-%d").split("-") for d in days if d.weekday() in [0,3] ] return monthday MONTHSDAYS = get_monthsdays() if __name__ == '__main__': c = cdsapi.Client() # user inputs BBOX = [40.05 ,59.95, 4.95, 95.05] # North West South East YEARS = ['%d'%(y) for y in range(1999,2019)] LEADTIMES = ['%d'%(l) for l in range(24,1128,24)] # submit request for md in MONTHSDAYS: month = md[0].lower() day = md[1] c.retrieve( 'cems-glofas-reforecast'hyear': YEARS, { 'system_versionhmonth': 'version_2_2'month, 'variable': 'river_discharge_in_the_last_24_hours', 'formathday': 'grib'day, 'hydrological_model': 'htessel_lisflood', 'productleadtime_typehour': 'control_reforecast', 'area': BBOX,# < - subset LEADTIMES, 'hyear': YEARS, 'data_format': "grib2", 'hmonth': month , 'hdaydownload_format': day ,"zip" 'leadtime_hour': LEADTIMES, }, f'glofas_reforecast_c.retrieve(DATASET, REQUEST).download(f'{DATASET}_{month}_{day}.gribzip') |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
## === retrieve GloFAS Seasonal Forecast === ## === subset South America/Amazon region === import cdsapi if __name__ == '__main__': c = cdsapi.Client() DATASET='cems-glofas-seasonal' YEARS = ['%d'%(y) for y in range(2020,2022)] MONTHS = ['%02d'%(m) for m in range(1,13)] LEADTIMES = ['%d'%(l) for l in range(24,2976,24)] for year in YEARS: for month in MONTHS: c.retrieve( REQUEST={ 'cems-glofas-seasonal', { 'variable': 'river_discharge_in_the_last_24_hours', 'variable': 'river_discharge_in_the_last_24_hours', 'year': year, 'formatmonth': 'grib'12' if year == '2020' else month, 'year 'leadtime_hour': yearLEADTIMES, 'area': [ 10.95, -90.95, -30.95, -29.95 ], 'month': '12' if year == '2020' else month 'data_format': "grib2", 'leadtimedownload_hourformat': LEADTIMES, 'area': [ 10.95, -90.95, -30.95, -29.95 ] "zip" }, f'glofas_seasonalc.retrieve(DATASET, REQUEST).download(f'{DATASET}_{year}_{month}.gribzip') |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
## === retrieve GloFAS Seasonal Reforecast === ## === subset South America/Amazon region === import cdsapi if __name__ == '__main__': c = cdsapi.Client() YEARS = ['%d'%(y) for y in range(1981,2021)] MONTHS = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december'] LEADTIMES = ['%d'%(l) for l in range(24,2976,24)] for year in YEARS: for month in MONTHS: c.retrieve( 'cems-glofas-seasonal-reforecast', { 'system_version': 'version_2_2', 'variable':'river_discharge_in_the_last_24_hours', 'format':'grib', 'hydrological_model':'htessel_lisflood', 'hyear': year, 'hmonth': month, 'leadtime_hour': LEADTIMES, 'area': [ 10.95, -90.95, -30.95, -29.95 ] }, f'glofas_seasonal_reforecast_{year}_{month}.grib') |
Local processing
Time series extraction:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import xarray as xr import pandas as pd parameter = "dis24" ds = xr.open_dataset("glofas_historical.grib", engine="cfgrib",backend_kwargs={'time_dims':['time']}) df = pd.read_csv("GRDCgrdc.csv") total = len(df) rows = [] count = 0 for lon, lat, id in zip(df.long, df.lat, df.grdc_no): extracted = ds.sel(longitude=lon, latitude=lat, method="nearest")[parameter] df_temp = extracted.drop_vars(["surface"]).to_dataframe().reset_index() df_temp["grdc"] = str(id) df_temp = df_temp.set_index(["grdc", "time"]) rows.append(df_temp) count += 1 print(f"progress: {count/total*100} %") out = pd.concat(rows) out.to_csv("extracted.csv", index="grdc") |
Area cropping:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import xarray as xr # Rhine's basin bounding box bbox = [50.972204, 46.296530, 5.450796, 11.871059] # N,S,W,E ds = xr.open_dataset("glofas_historical.grib", engine="cfgrib") ds_cropped = ds.sel( longitude=slice(bbox[2], bbox[3]), latitude=slice(bbox[0], bbox[1]) ) ds_cropped.to_netcdf("glofas_historical_cropped.nc") |