Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Image Added

A large volume of data (100s of TB) is downloaded

...

every day from the CDS.

...

On this page we summarise:

Table of Contents

CDS Request times: the reasons underpinning the CDS queuing time

CDS Queuing can be monitored from the 'Your Requests' or using the 'CDS Live'

The CDS retrieval times can vary significantly depending on the number of requests that the CDS has at any one time and also based on the following factors that affect EFAS and GloFAS:

  • The priority of the dataset in question
  • The size of the request
  • The number of requests submitted by a user
  • The number of requests to retrieve data from ECMWF Archive
  • The number of requests requesting a specific dataset
  • The number of active slots
  • The size of the overall queue.

The CDS strives to deliver data as fast as possible, however, it is not an operational service and should not be relied upon to deliver data in real-time as it is produced.


Here we will try to give some context

...

on why requests can

...

take time:

Data for the CEMS-Floods (EFAS and GloFAS) datasets are held within MARS at ECMWF.

...

The MARS

...

service is a system designed for the request of GRIB

...

files based on a

...

disk cache and

...

tape storage architecture.

...

The most recent data is held on disk cache

...

, while all

...

available data is stored on tape. When a user requests data, the CDS

...

places that request in a queue. Requests are prioritized by the CDS based on the factors listed above. 

Once the request becomes eligible, it is passed to the MARS

...

service at ECMWF for extraction of the relevant fields.

...

It is only at this point that you will see your

...

request as 'Running'

...

When a user selects an area of data, it does not mean that you are not retrieving the

...

entire global dataset. Each timestep of each date of each variable is classed as an individual

...

GRIB field.

...

MARS extracts sub-areas of data by retrieving the entire global grid

...

, cropping the selected area

...

, then returning the requested area to the user.

MARS

...

is a separate service to the CDS which also has constraints on its workload

...

. MARS has

...

its own QOS limits

...

applicable to

...

data requests, as it is a service shared across

...

operational services (i.e. producing ERA5

...

, GloFAS

...

, and non-operational services such as the CDS).

The CDS service, from time to time, can experience periods of high user activity and increasing

...

queue time for even small requests.

...

During these times we ask you to kindly wait for the queue to be processed, as there are fixed slots available that cannot be increased.

...

Figure

...

1 shows a period of high user activity. GloFAS and EFAS products are served by the adaptor.mars.external service, you can see that the active users (blue line)

...

are well above the

...

50 slots allocated to the GLoFAS and EFAS requests (green line). When the blue line falls again below the green line then the total queued users start decreasing until eventually there is no queuing time for any user request.

Figure

...

Image ModifiedImage Modified

CEMS-Flood data on MARS: the size of the CEMS-Flood datasets stored on MARS and accessible through the CDS

Table 1

...

DatasetCDS Catalogue FormOverall SizeDays on Disk
GloFAS climatologyhttps://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-historical?tab=overview86GB30
GloFAS forecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-forecast7.6TB15

GloFAS reforecast

https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-reforecast6TB0 (unless recently requested)
GloFAS seasonal forecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-seasonal0.5TB10 days for most recent forecast
GloFAS seasonal reforecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-seasonal-reforecast8.75TB0 (unless recently requested)
EFAS climatologyhttps://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-historical

817GB

30
EFAS forecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-forecast26.49TB10
EFAS reforecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-reforecast25.95TB0 (unless recently requested)
EFAS seasonal forecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-seasonal0.5TB10 days for most recent forecast
EFAS seasonal reforecasthttps://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-seasonal-reforecast13.25 TB0 (unless recently requested)


Request strategy: the best practices

...

to maximise efficiency and minimise

...

wait time

The CDS enforces constraints to the number of fields per request that are retrievable for each dataset. The reason is to keep the system responsive for as many users as possible. The consequence is that you cannot download the whole dataset in one go, but you will need to

...

devise a retrieve strategy to, for example, loop over certain fields and retrieve the dataset in chunks.

It is also very important that, when only a part of the geographical domain is needed, the user crops the data to a region of interest (ROI). This will help

...

keep the downloaded data size as small as possible.

In Table 2, we list the maximum fields per request that you can retrieve for each dataset and the corresponding downloaded data size, assuming that you are:

  • requesting GRIB2 file format

...

  • not cropping
  • requesting the shorter time steps (6 hours vs 24 hours), when available
  • requesting ensemble perturbed forecasts

We also provide a short description of the request strategy and

...

a link

...

to a script that you can use to perform the request.


Table 2 - Request strategy

DatasetN. of fields per requestRequest strategyCorresponding  file size per request (loop)Link to example script
GloFAS climatology500Loop over years2 GBscript
GloFAS forecast60loop over years, months, days8.1 GBscript

GloFAS reforecast

950

loop over months, days

Consider cropping to ROI

32 GB
GloFAS seasonal forecast125

Loop over years, months

Consider cropping to ROI

31.5 GB
GloFAS seasonal reforecast125

Loop over years, months

Consider cropping to ROI

31.5 GB
EFAS climatology

1000

Loop over years450 MBscript
EFAS forecast1000loop over years, months, days3.7 GBscript
EFAS reforecast200loop over

...

year, months, days2.3 GB
EFAS seasonal forecast220loop over months, days13.1 GB
EFAS seasonal reforecast220loop over months, days13.1 GB

Whilst submitting multiple requests can improve download time, overloading the system with too many requests will eventually slow down the overall system performance.

Indeed the CDS system penalises users that submit too many requests, decreasing the priority of their requests. In

...

short:

...

Too many parallel requests will eventually result in a slower overall download

...

time.

For this reason, we suggest limiting to

...

a maximum of 10 parallel requests

...

.


Code Block
languagepy
titleExample code, download 20 years of GloFAS reforecasts using 10 threads
collapsetrue
import cdsapi
from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings("ignore")

LEADTIMES = ["%d" % (l) for l in range(24, 1128, 24)]
YEARS = ["%d" % (y) for y in range(1999, 2019)]


def get_dates(start=[2019, 1, 1], end=[2019, 12, 31]):
    start, end = datetime(*start), datetime(*end)
    days = [start + timedelta(days=i) for i in range((end - start).days + 1)]
    dates = [
        list(map(str.lower, d.strftime("%B-%d").split("-")))
        for d in days
        if d.weekday() in [0, 3]
    ]
    return dates


DATES = get_dates()

def retrieve(client, request, date):

    month = date[0]
    day = date[1]
    print(f"requesting month: {month}, day: {day} /n")
    request.update({"hmonth": month, "hday": day})
    client.retrieve(
        "cems-glofas-reforecast", request, f"glofas_reforecast_{month}_{day}.grib"
    )
    return f"retrieved month: {month}, day: {day}"


def main(request):
    "concurrent request using 10 threads"
    client = cdsapi.Client()
    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = [
            executor.submit(retrieve, client, request.copy(), date) for date in DATES
        ]
        for f in as_completed(futures):
            try:
                print(f.result())
            except:
                print("could not retrieve")


if __name__ == "__main__":

    request = {
        "system_version": "version_2_2",
        "variable": "river_discharge_in_the_last_24_hours",
        "format": "grib",
        "hydrological_model": "htessel_lisflood",
        "product_type": "control_reforecast",
        "hyear": YEARS,
        "hmonth": "",
        "hday": "",
        "leadtime_hour": LEADTIMES,
    }

    main(request)