Comparing data content against reference

Introduction

Below are simple checks which can be implemented to ensure that the file content is as expected. It is assumed that all fields in all expected files should remain the same if there is not any change or issue in data production (dispatching etc).

the number of all fields must be as expected
the actual full field list must be the same as expected

An example how to create a reference field list from given files, GRIBs in this case, and compare it to an actual field list follows.

The same approach can be used for any type of files but an appropriate tool for field list creation must exist or be coded.

Workflow

create a reference field list
- get full sample data and check thoroughly that it contains all expected fields
  - if this is the case the field list created as per below can be stored for future needs as the valid reference
- in case of a change in the data (meaning e.g. new or removed fields after a model's upgrade) a new valid reference must be created
create an actual field list
- as a first quick check, e.g. after getting all data, one can compare that the number of all fields is equal to the number of all reference fields
- following full reference check means comparing full field list to the reference one

Examples

get_field_list.py usage

An example of creation of the reference or actual field list using python script get_field_list.py (ecCodes python api is prerequisite).

this is a version of get_field_list.py modified for LC-WFV data sets' needs
- each data set requires to define different unique GRIB keys which must unambiguously identify any expected field
- it is rather straightforward to modify the script for other data sets
- if the script is run without -c option the actual date for each field is parsed (not usable for reference check as the data is the only expected changing GRIB key..)

#!/bin/ksh
set -ex

# $reflist is a link to the reference field list
# $DTS_ALLOW_NEW_REFERENCE is "true" if a new reference is required/expected

# get actual field list for comparison to the reference
python $DTS_BIN/get_field_list.py -c lw.grib2 > list.tmp
awk '{print $1}' list.tmp | sort > list

# check if anything changed
diff --changed-group-format='%%<' --unchanged-group-format='' list $reflist > diff.added.tmp || true
diff --changed-group-format='%%>' --unchanged-group-format='' list $reflist > diff.removed.tmp || true
cat diff.added.tmp   | sort > diff.added
cat diff.removed.tmp | sort > diff.removed

if [[ -s diff.added || -s diff.removed ]] ; then
  # some differences found..

    if [[ "${DTS_ALLOW_NEW_REFERENCE}" = "true" ]] ; then
      cp -f list $reflist
      echo "A new partial reference field list created! ($reflist)"
    else
      echo "Differences comparing to the actual reference field list found!"
      exit -1
  fi

else
  smslabel info "The actual reference is valid ($reflist)"
fi

Reference field list example

lw_sabm_000000001800_xxxx_fc_sl_level0000_step0_10u
lw_sabm_000000001800_xxxx_fc_sl_level0000_step0_10v
lw_sabm_000000001800_xxxx_fc_sl_level0000_step0_pp1d
lw_sabm_000000001800_xxxx_fc_sl_level0000_step0_swh
lw_sabm_000000001800_xxxx_fc_sl_level0000_step10_10u
lw_sabm_000000001800_xxxx_fc_sl_level0000_step10_10v
lw_sabm_000000001800_xxxx_fc_sl_level0000_step10_pp1d
lw_sabm_000000001800_xxxx_fc_sl_level0000_step10_swh
lw_sabm_000000001800_xxxx_fc_sl_level0000_step11_10u
...
...
lw_sabm_000000001800_xxxx_fc_sl_level0000_step96_pp1d
lw_sabm_000000001800_xxxx_fc_sl_level0000_step96_swh
lw_sabm_000000001800_xxxx_fc_sl_level0000_step9_10u
lw_sabm_000000001800_xxxx_fc_sl_level0000_step9_10v
lw_sabm_000000001800_xxxx_fc_sl_level0000_step9_pp1d
lw_sabm_000000001800_xxxx_fc_sl_level0000_step9_swh

Space shortcuts

Page tree

Introduction

Workflow

Examples

get_field_list.py usage

Reference field list example