Page History

...

Dataset	CDS Catalogue Form	Overall Size	Days on Disk
GloFAS climatology	https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-historical?tab=overview	86GB	30
GloFAS forecast	https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-forecast	7.6TB	15
GloFAS reforecast	https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-reforecast	6TB	0 (unless recently requested)
GloFAS seasonal forecast	https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-seasonal	0.5TB	10 days for most recent forecast
GloFAS seasonal reforecast	https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-seasonal-reforecast	8.75TB	0 (unless recently requested)
EFAS climatology	https://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-historical	817GB	30
EFAS forecast	https://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-forecast	26.49TB	10
EFAS reforecast	https://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-reforecast	25.95TB	0 (unless recently requested)
EFAS seasonal forecast	https://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-seasonal	0.5TB	10 days for most recent forecast
EFAS seasonal reforecast	https://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-seasonal-reforecast	13.25 TB	0 (unless recently requested)

Request strategy: the best practices to to maximise efficiency and minimise waiting time

The CDS enforces constraints to the number of field retrievable for each dataset. The reason is to keep the system responsive for as many users as possible. The consequence is that you cannot download the whole dataset in one go, but you will need to device a retrieve strategy to, for example, loop over certain fields and retrieve the dataset in chunks.

It is also very important that, when only a part of the geographical domain is needed, the user crops the data to a region of interest (ROI). This will help keeping the downloaded data size as small as possible.

In Table 2, we list the maximum fields that you can retrieve for each dataset and the corresponding downloaded data size, assuming GRIB2 file format. Also, we provide a short description of the request strategy and the link to a script that you can use to perform the request.

Table 2 - SummaryRequest strategy

Dataset	API field limits	Downloaded data size	Request strategy	Link to example script
GloFAS climatology	500	2 GB	Loop over years	script
GloFAS forecast	60	8.1 GB	loop over years, months, days	script
GloFAS reforecast	950	32 GB	loop over months, days Subset Crop to ROI
GloFAS seasonal forecast	125	31.5 GB	Loop over years, months Subset Crop to ROI
GloFAS seasonal reforecast	125	31.5 GB	Loop over years, months Subset Crop to ROI
EFAS climatology	1000	to be confirmed	to be confirmed		script
EFAS forecast	1000	to be confirmed	to be confirmed		script
EFAS reforecast	200	to be confirmed	to be confirmed	to be confirmed
EFAS seasonal forecast	220to be confirmed	to be confirmed	to be confirmed
EFAS seasonal reforecast	220	to be confirmed	to be confirmed	to be confirmed

Whilst submitting multiple requests can improve download time, overloading the system with too many requests will eventually slow down the overall system performance.

Indeed the CDS system penalises users that submit too many requests, decreasing the priority of their requests. Too many parallel requests could In few words: too many parallel requests will eventually result in a slower overall download time

...

Example code, download 20 years of GloFAS reforecasts using 10 threads.

Code Block

language	py
collapse	true

import cdsapi
from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings("ignore")

LEADTIMES = ["%d" % (l) for l in range(24, 1128, 24)]
YEARS = ["%d" % (y) for y in range(1999, 2019)]


def get_dates(start=[2019, 1, 1], end=[2019, 12, 31]):
    start, end = datetime(*start), datetime(*end)
    days = [start + timedelta(days=i) for i in range((end - start).days + 1)]
    dates = [
        list(map(str.lower, d.strftime("%B-%d").split("-")))
        for d in days
        if d.weekday() in [0, 3]
    ]
    return dates


DATES = get_dates()

def retrieve(client, request, date):

    month = date[0]
    day = date[1]
    print(f"requesting month: {month}, day: {day} /n")
    request.update({"hmonth": month, "hday": day})
    client.retrieve(
        "cems-glofas-reforecast", request, f"glofas_reforecast_{month}_{day}.grib"
    )
    return f"retrieved month: {month}, day: {day}"


def main(request):
    "concurrent request using 10 threads"
    client = cdsapi.Client()
    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = [
            executor.submit(retrieve, client, request.copy(), date) for date in DATES
        ]
        for f in as_completed(futures):
            try:
                print(f.result())
            except:
                print("could not retrieve")


if __name__ == "__main__":

    request = {
        "system_version": "version_2_2",
        "variable": "river_discharge_in_the_last_24_hours",
        "format": "grib",
        "hydrological_model": "htessel_lisflood",
        "product_type": "control_reforecast",
        "hyear": YEARS,
        "hmonth": "",
        "hday": "",
        "leadtime_hour": LEADTIMES,
    }

    main(request)

...

Page tree

Versions Compared

Old Version 14

New Version 15

Key

Request strategy: the best practices to to maximise efficiency and minimise waiting time