From test to production archiving in 3 steps

The archiving of UERRA data in MARS should be done by each provider on his own with full support of ECMWF's team.

The ECMW's account is needed to be able to archive data in MARS and access the database online or via web-api interface.

Data checking

As for any good data archive it is crucial to have working data quality checking in place. Every data provider should double check that their parameters are fully compliant with required and agreed definitions linked in the main parameter page. A special attention before any archiving activities must be paid for example to correct units, fluxes sign conventions etc as specified in the GRIB2 encoding page. The additional checking tools provided by ECMWF and described below cannot discover all types of possible fundamental errors from principle (like incorrect units or reverse sign of fluxes where values range is both positive and negative).

Here are three steps to follow to archive data in MARS:

  1. MARSSCRATCH:
    The MARSSCRATCH is temporary development version of MARS database. Anybody can use it for the first preliminary tests but the data can disappear at any time even it is not happening most of the time.
    Example of COSMO sample data stored in MARSSCRATCH: http://apps.ecmwf.int/mars-servers/marsscratch/?origin=edzw&stream=oper&class=ur&expver=test
  2. MARS (expver=test)
    Before the full production archiving (step 3) a few days of full UERRA data from given model are required to be archived in MARS with expver=test. It must be thoroughly checked that all fields were archived as expected (no missing or additional ones; no missing or additional steps, levels etc.). The data archived as expver=test might be overwritten during this phase.
  3. MARS (expver=prod)
    Once all providers' processing and archiving scripts are verified as fully working during step 2, the production archiving can start. The production data in MARS might not be overwritten in the future as it is user data who can download them in any time so the step 2 has crucial importance. It is also very difficult to delete any unwanted data from production MARS e.g. archived by mistake so it should be avoided by all means. Only one user ID per provider is expected to have granted permissions for archiving in MARS to avoid any kind of possible double archiving in the same time etc.

How to archive effectively

It is important to archive UERRA data effectively following MARS design specific for this type of dataset. The possible suboptimal archiving (e.g. one parameter by one in each archiving request instead of archiving them all together) would cause various MARS performance issues which could be avoided.

Using archiving scripts which will be provided (see below) each provider will archive always all data from one day at once for given origin, stream and levtype (i.e. all parameters, times, steps and levels together). If any MARS performance issue occurs during test and production archiving with that approach it might be needed to change it.

As a consequence of MARS design for UERRA datasets up to one full month model data could be theoretically archived at once (again for given origin, stream and levtype as usually).

Tools

Following tools will be available for providers to allow smooth data processing and archiving in MARS:

Conversion script from GRIB1 to UERRA compliant GRIB2

For models which produce the input data for MARS still in GRIB1 the conversion scripts to the required GRIB2 format will be provided. They are based on GRIB-API grib_filter tool which requires the GRIB-API version fully supporting all UERRA parameters.

Examples of grib_filter rules for GRIB1 HARMONIE parameters:

Archiving request

An example how to archive full data sample produced by given model will be provided to each partner. Basically only the date should change in the script otherwise the content of archive is expected not to be changing for the whole archiving period (i.e. homogeneous data without gaps or any other variation is expected).

Example of MARS archiving request for full surface, soil and model level data from HARMONIE deterministic reanalysis (origin=eswi, stream=oper, type=an):

MARS archiving request
# $date must be parsed with appropriate date before running MARS request below

archive,class=ur,database=marsscratch,
stream=oper,type=an,levtype=sfc,expver=TEST,date=$date,origin=eswi,time=0/6/12/18,
param=lcc/lsm/msl/skt/sp/orog/mcc/rsn/sd/hcc/10wdir/2t/al/10si/sr/tcw/2r/tcc,
level=0/2/10,
step=0,
number=off,
expect=72,
source=an.$date.sfc.grib2

archive,class=ur,database=marsscratch,
stream=oper,type=an,levtype=sol,expver=TEST,date=$date,origin=eswi,time=0/6/12/18,
param=vsw/sot,
level=1/to/3,
step=0,
number=off,
expect=24,
source=an.$date.sol.grib2

archive,class=ur,database=marsscratch,
stream=oper,type=an,levtype=ml,expver=TEST,date=$date,origin=eswi,time=0/6/12/18,
param=u/t/v/q,
level=1/to/65,
step=0,
number=off,
expect=1040,
source=an.$date.ml.grib2

Checking tools

There are 2 types of checking tools which must be run before and after archiving to try to minimize possible errors in the archive

  1. UERRA-GRIB2 checking tool (tigge_check)
    This tool should be run on all input files already in GRIB2 UERRA compliant format before archiving them. It checks all encoding details so that only fully compliant UERRA files following exactly required definitions would pass. It can check also allowed value ranges for each parameter if used with the option -v.

    The tigge_check can check only the encoded  GRIB2 keys to have them compliant with expected UERRA definitions specified generally for all UERRA datasets and each particular parameter. Some types of possible fundamental errors cannot be nevertheless revealed by the the tool from principle (e.g. incorrect units which are never encoded in GRIB2 files or a possibly wrong (reverse) sign of fluxes where values range is both positive and negative.

    The data min/max value checking provided by tigge_check (-v) must be considered only as a helping option to have some chance to reveal sometimes real data issues. In some cases e.g. for radiation fluxes the allowed limits must be very flexible as for example direct solar radiation in 1-hourly outputs changes from 0 to 1+e9 depending on  forecast step.

    The tuning of the allowed limits for numerous parameters from different models on varying domains as in UERRA case is tricky and it is an ongoing process. On top of it there are sometimes clearly wrong values coming from some models during specific weather situations (e.g. grid point storms with wind speed exceeding 900 m/s). After an agreement with data provider the tool can allow such unrealistic values for given model as "normal". It should be understood that such data will have user impact and might be still considered as poor output data quality checking.

  2. MARS archive content checking script
    This kind of script is required to be run after each archiving to check that only expected fields were archived successfully (always the same parameters without any change). The checking script below is based on MARS list functionality.

    Example of MARS list request checking the content of MARS for COSMO data from 1993-12-31.

    MARS checking scripts
    list,
          class      = ur,
          stream     = oper,
          type       = all,
          date       = 19931231,
          time       = all,
          levtype    = all,
          origin     = eswi,
          expver     = test,
          hide       = file/length/offset/id/missing/cost/branch/date/hdate/month/year,
          target     = tree.out,
          database   = marsscratch,
          output     = tree
    
    list,
          class      = ur,
          hide       = file/length/offset/id/missing/cost/branch/param/levtype/levelist/expver/type/class/stream/origin/date/time/step/number/hdate/month/year,
          target     = cost.out,
          output     = table

    The tree.out content should be the same for all archived dates what can be easily checked e.g. with unix diff tool against the reference MARS list output created from the very 1st properly archived day for given model.

    tree.out content
    class=ur,expver=test,levtype=hl,origin=eswi,stream=oper,type=an,time=00:00:00/06:00:00/12:00:00/18:00:00,param=10/130/157/3031/54,levelist=100/15/150/200/250/30/300/400/50/500/75
    class=ur,expver=test,levtype=ml,origin=eswi,stream=oper,type=an,param=130/131/132/133,time=00:00:00/06:00:00/12:00:00/18:00:00,levelist=1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/23/24/25/26/27/28/29/3/30/31/32/33/34/35/36/37/38/39/4/40/41/42/43/44/45/46/47/48/49/5/50/51/52/53/54/55/56/57/58/59/6/60/61/62/63/64/65/7/8/9
    class=ur,expver=test,levtype=pl,origin=eswi,stream=oper,type=an,time=00:00:00/06:00:00/12:00:00/18:00:00,param=130/131/132/156/157,levelist=10/100/1000/150/20/200/250/30/300/400/50/500/600/70/700/750/800/825/850/875/900/925/950/975
    class=ur,expver=test,levtype=sfc,origin=eswi,stream=oper,type=an,time=00:00:00/06:00:00/12:00:00/18:00:00,param=134/136/151/167/172/173/207/228002/228141/228164/235/260242/260260/260509/3073/3074/3075/33
    class=ur,expver=test,levtype=sol,origin=eswi,stream=oper,type=an,param=260199/260360,levelist=1/2/3,time=00:00:00/06:00:00/12:00:00/18:00:00
    class=ur,expver=test,levtype=hl,origin=eswi,stream=oper,type=fc,param=10/130/157/246/247/3031/54,levelist=100/15/150/200/250/30/300/400/50/500/75
     time=00:00:00/12:00:00,step=1/12/15/18/2/21/24/27/3/30/4/5/6/9
     time=06:00:00/18:00:00,step=1/2/3/4/5/6
    class=ur,expver=test,levtype=pl,origin=eswi,stream=oper,type=fc,param=130/131/132/156/157/246/247/260257,levelist=10/100/1000/150/20/200/250/30/300/400/50/500/600/70/700/750/800/825/850/875/900/925/950/975
     time=00:00:00/12:00:00,step=1/12/15/18/2/21/24/27/3/30/4/5/6/9
     time=06:00:00/18:00:00,step=1/2/3/4/5/6
    class=ur,expver=test,levtype=sfc,origin=eswi,stream=oper,type=fc
     time=00:00:00/06:00:00/12:00:00/18:00:00,step=1/2/3/4/5/6,param=134/136/146/147/151/167/169/173/174008/175/176/177/201/202/207/228141/228144/228164/228228/235/260242/260259/260260/260264/260430/260509/3073/3074/3075/33/49
     time=00:00:00/12:00:00,step=12/15/18/21/24/27/30/9,param=134/136/151/167/169/175/176/177/201/202/207/228144/228164/228228/235/260242/260259/260260/260264/3073/3074/3075/49
    class=ur,expver=test,levtype=sol,origin=eswi,stream=oper,type=fc,param=260199/260360,levelist=1/2/3,time=00:00:00/06:00:00/12:00:00/18:00:00,step=1/2/3/4/5/6
    cost.out content
    Grand Total:
    ============
    
    Entries       : 13,852
    Total         : 8,931,971,037 (8.31855 Gbytes)
    
    
    > archived=$(cat cost.out| grep ^Entries|sed s/,//g| sed 's/.*: //')
    > echo $archived
    > 13852
    
    

    The number of fields archived must be always the same. That number can be easily parsed from the above output for example using unix grep:

    Number of fields archived
    > archived=$(cat cost.out| grep ^Entries|sed s/,//g| sed 's/.*: //')
    > echo $archived
    > 13852