Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

DatasetCDS Catalogue FormOverall SizeDays on Disk
GloFAS climatologyhttps://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-historical?tab=overview86GB30
GloFAS forecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-forecast7.6TB15

GloFAS reforecast

https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-reforecast6TB0 (unless recently requested)
GloFAS seasonal forecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-seasonal0.5TB10 days for most recent forecast
GloFAS seasonal reforecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-seasonal-reforecast8.75TB0 (unless recently requested)
EFAS climatologyhttps://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-historical

817GB

30
EFAS forecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-forecast26.49TB10
EFAS reforecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-reforecast25.95TB0 (unless recently requested)
EFAS seasonal forecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-seasonal0.5TB10 days for most recent forecast
EFAS seasonal reforecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-seasonal-reforecast13.25 TB0 (unless recently requested)


Request strategy: the best practices to to maximise efficiency and minimise waiting time

The CDS enforces constraints to the number of field retrievable for each dataset. The reason is to keep the system responsive for as many users as possible. The consequence is that you cannot download the whole dataset in one go, but you will need to device a retrieve strategy to, for example, loop over certain fields and retrieve the dataset in chunks.

It is also very important that, when only a part of the geographical domain is needed, the user crops the data to a region of interest (ROI). This will help keeping the downloaded data size as small as possible.

In Table 2, we list the maximum fields that you can retrieve for each dataset and the corresponding downloaded data size, assuming GRIB2 file format. Also, we provide a short description of the request strategy and the link to  a script that you can use to perform the request.


Table 2 - SummaryRequest strategy

DatasetAPI field limitsDownloaded data sizeRequest strategyLink to example script
GloFAS climatology5002 GBLoop over yearsscript
GloFAS forecast608.1 GBloop over years, months, daysscript

GloFAS reforecast

95032 GB

loop over months, days

Subset Crop to ROI


GloFAS seasonal forecast12531.5 GB

Loop over years, months

Subset Crop to ROI


GloFAS seasonal reforecast12531.5 GB

Loop over years, months

Subset Crop to ROI


EFAS climatology

1000

to be confirmedto be confirmed

script
EFAS forecast1000to be confirmedto be confirmed

script
EFAS reforecast200to be confirmedto be confirmedto be confirmed


EFAS seasonal forecast220to be confirmedto be confirmedto be confirmed


EFAS seasonal reforecast220to be confirmedto be confirmedto be confirmed


Whilst submitting multiple requests can improve download time, overloading the system with too many requests will eventually slow down the overall system performance.

Indeed the CDS system penalises users that submit too many requests, decreasing the priority of their requests.                                                                                           Too many parallel requests could In few words: too many parallel requests will eventually result in a slower overall download time 

...

Example code, download 20 years of GloFAS reforecasts using 10 threads.

Code Block
languagepy
collapsetrue
import cdsapi
from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings("ignore")

LEADTIMES = ["%d" % (l) for l in range(24, 1128, 24)]
YEARS = ["%d" % (y) for y in range(1999, 2019)]


def get_dates(start=[2019, 1, 1], end=[2019, 12, 31]):
    start, end = datetime(*start), datetime(*end)
    days = [start + timedelta(days=i) for i in range((end - start).days + 1)]
    dates = [
        list(map(str.lower, d.strftime("%B-%d").split("-")))
        for d in days
        if d.weekday() in [0, 3]
    ]
    return dates


DATES = get_dates()

def retrieve(client, request, date):

    month = date[0]
    day = date[1]
    print(f"requesting month: {month}, day: {day} /n")
    request.update({"hmonth": month, "hday": day})
    client.retrieve(
        "cems-glofas-reforecast", request, f"glofas_reforecast_{month}_{day}.grib"
    )
    return f"retrieved month: {month}, day: {day}"


def main(request):
    "concurrent request using 10 threads"
    client = cdsapi.Client()
    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = [
            executor.submit(retrieve, client, request.copy(), date) for date in DATES
        ]
        for f in as_completed(futures):
            try:
                print(f.result())
            except:
                print("could not retrieve")


if __name__ == "__main__":

    request = {
        "system_version": "version_2_2",
        "variable": "river_discharge_in_the_last_24_hours",
        "format": "grib",
        "hydrological_model": "htessel_lisflood",
        "product_type": "control_reforecast",
        "hyear": YEARS,
        "hmonth": "",
        "hday": "",
        "leadtime_hour": LEADTIMES,
    }

    main(request)

...