Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleThe objective:

A good understanding of the MARS efficiency issues is essential especially for users that are interested in downloading large amounts of data.

The aim of this page is to help users to improve their MARS requests performance focusing in CMA reforecast data using the ECMWF WEB APIMARS requests.

How the S2S data is organised in general?

Info

It is every important for a user First you need to understand how the S2S data is organised in MARS.

In general it is organised in MARS as a huge tree, with the indentation below, showing different levels down that tree:

  • centre (ECMWF, NCEP, JMA, ...)
    • realtime or reforecast
      •  type of data (control forecast or perturbed forecast)
        • type of level (single level or pressure level or potential  temperature)
          • dates (2015-01-01 or 2015-01-05 or 2015-01-08, ...)
            •  time-steps
              • members (for perturbed forecast)
                • levels (for pl or pt)
                  • parameters


(lightbulb) The idea is to be in the same tape file, all time-steps, all members, all parameters for a type of level, a type, a date

...

Info
titleThe main idea in brief:

Taking under consideration what has been explained above,  if you need to loop in a several CMA MARS request, follow the hierarchy belowthe best approach would be the following:

  • date
    • hindcast date
      • number (EPS only)
        • level
          • parameter (inner loop)

 

...

What would be a pseudo algorithm to loop over several dates for a CMA request?

Info
titleThe main idea in brief:
for date in dates-list
     for hindcast
date in hincasts
 in hindcast-dates-list
      your-request (includes the levels, parameters, steps etc)

A pseudo algorithm on how to request Control forecast, pressure levels, for years 2010-2014 for 2 months (eg April and June)

Info
titleThe main idea in brief:
for each year from 2010 to 2014
    for months April, June
for each hindcast date
your-request


Info
titleThe main idea in brief:


...

old below:

---------------

...

titleThe main idea in brief:
  1.  4 categories of requests:
    1. control plevels
    2. control sfc
    3. ensemble plevels
    4. ensemble sfc
  2. For each category above:
    1. For each year from 1994 to 2014
      1. For each month from January to December
        1. retrieve hindcast dates 1-15 using requests according to data availability*
          1. API request 1
          2. API request 2
          3. API request 3
        2. retrieve hindcast dates 15-end of month using requests according to data availability
          1. API request 1
          2. API request 2
          3. API request 3

...

  • for plevels different parameters are available on different levels so Ben has created 3 pl requests

...

 


Info
titleThe objective:


cf and pf are stored separately. 

For pf is not efficient to do:
        HDATE = 20040101/20040106/20040111/20040116/20040121/20040126/20040201/20040206/20040211/20040216/20040221/20040226/20040301/20040306/20040311/20040316/20040321/20040326/20040401/20040406/20040411/20040416/20040421/20040426/20040501/20040506/20040511/20040516/20040521/20040526/20040601/20040606/20040611/20040616/20040621/20040626/20040701/20040706/20040711/20040716/20040721/20040726/20040801/20040806/20040811/20040816/20040821/20040826/20040901/20040906/20040911/20040916/20040921/20040926/20041001/20041006/20041011/20041016/20041021/20041026/20041101/20041106/20041111/20041116/20041121/20041126/20041201/20041206/20041211/20041216/20041221/20041226,
        NUMBER = 1,

but it is more efficient to group all members in one go for fewer dates:
        HDATE = 20040101/20040106/20040111/20040116/20040121/20040126,
        NUMBER = 1/to/50
This implies less positioning and more contiguous reads.

We could look at increasing the 10GB limit. We have more hardware and are in a better position to handle bigger chunks.


For the 3 allowed streams, I would extract different type/levtype in different streams. For instance:
a) 1 stream: pf/sfc
b) 1 stream: pf/pl
c) 1 stream: cf/sfc and when it finishes,cf/pl
and I would put 2 requests for each stream above. Once a request finishes and starts the downloading, the next request will kick in, and it will find in many cases that the tape volume is still in the tape drive, which will save in avg 2 minutes for a tape mount.


...