Description

For the projects  TIGGE, S2S and UERRA the exact data format, WMO compliant GRIB2,  is required to allow easy data processing and intercomparison. To check that the encoding is as requested, one can use Python grib-check tool available in https://github.com/ecmwf/grib-check

The grib-check can do also some basic quality control by checking the allowed value ranges for each parameter (with -L option) if they were defined. There is another newer better maintainable tool doing similar basic quality check called grib_check.py. Read more information about both tools in Data quality checking tools (python source code is available there).


Please be aware that the older GRIB2 encoding checking tools tigge_check and tigge_check.py were discontinued (the last ecCodes, which still includes the original C-language tigge_check, is the Version 2.31.0, part of ecmwf-toolbox/2023.07.0.0)

There is a another similar tool grib_enc_check.py still existing, which is used for data encoding check of more recent project LC-WFV  (Lead Centre for Wave Forecast Verification). This tool will be merged to grib-check tool in the future.

Examples of grib-check usage

grib-check options
$ grib-check --help
usage: grib-check [-h] [-L] -C {tigge,s2s,s2s_refcst,uerra,crra,lam,wpmip} [-l REPORT_DEPTH] [-d] [-p PARAMETERS] [-c] [-j NUM_JOBS] [-f] [-o {short,tree}] [-v] [-t] [--validity-check] path [path ...]

GribCheck is a tool that validates project-specific conventions of GRIB files. It performs a set of checks on GRIB messages to ensure they comply with the project's internal standards and expectations.

positional arguments:
  path                  path(s) to a GRIB file(s) or directory(s)

options:
  -h, --help            show this help message and exit
  -L, --check-limits    check value ranges (min/max limits)
  -C {tigge,s2s,s2s_refcst,uerra,crra,lam,wpmip}, --convention {tigge,s2s,s2s_refcst,uerra,crra,lam,wpmip}
                        data convention. The following conventions are experimental: wpmip.
  -l REPORT_DEPTH, --report-depth REPORT_DEPTH
                        report depth
  -d, --debug           debug mode
  -p PARAMETERS, --parameters PARAMETERS
                        path to parameters file
  -c, --color           use color in output
  -j NUM_JOBS, --num-jobs NUM_JOBS
                        number of jobs
  -f, --failed-only     show only failed checks
  -o {short,tree}, --output-type {short,tree}
                        output format
  -v, --version         show program's version number and exit
  -t, --show-type       show value type
  --validity-check      check validity of messages using the "isMessageValid" key provided by ecCodes. (experimental)
Checking TIGGE data
#   show only failing checks
grib-check -f -C tigge <grib2_file>

# show only failing checks
# use one line output format
# use 10 threads to speed up checking
# check min/max ranges
grib-check -L -f -oshort -j10 -C tigge <grib2_file>

# show all passed checks, use colours
grib-check -c -C tigge <grib2_file>

Examples of grib_enc_check.py usage
grib_enc_check.py options
# BIN=/home/ma/emos/def/lcwfv/bin
python $BIN/grib_enc_check.py         
usage: grib_enc_check.py [-h] [-v VERBOSITY] [-d DEFS]
                         [inp_file [inp_file ...]]

positional arguments:
  inp_file              enter input file name(s)

optional arguments:
  -h, --help            show this help message and exit
  -v VERBOSITY, --verbosity VERBOSITY
                        increase output verbosity [0-2]
  -d DEFS, --defs DEFS  path to definition files
Checking LC-WVF data
$BIN/grib_enc_check.py lw.grib2
 field 223(Mean wave direction) key: dataRepresentationTemplateNumber expected: <0..2> encoded: 40
 field 224(10 metre U wind component) key: dataRepresentationTemplateNumber expected: <0..2> encoded: 40
Number of error(s) found: 2

Performance tip to speed up checking big files 

There is a new tool  (ecCodes v>=2.6.0) called codes_split_file which is useful for parallellising decoding/checking tasks. The workflow below can be tried to further speed up processing (apart from using -j option asking for more threads)

NAME    codes_split_file
DESCRIPTION
        Split an input file (GRIB, BUFR etc) into chunks of roughly the same size.
        The output files are named input_1, input_2 etc. This is much faster than grib_copy/bufr_copy.
USAGE
        codes_split_file [-v] nchunks input
OPTIONS
        -v  Print the count of messages and files created

If one has a very large input file with 1000s of messages, instead of running one process which sequentially checks each message in the file, one can split the file into 8 chunks and run the checking code in parallel on the 8 output files.

set -e

# Assume you have 8 cores
codes_split_file 8 my_big.grib

# Now you will have my_big.grib_01, my_big.grib_02, ... my_big.grib_08
for f in my_big.grib_*; do
  # Run check in the background. Now multiple processes are running in parallel
  grib-check -C tigge $f &
done

# With the 'wait' command you can force the execution of the script to pause until a
# all background jobs have finished executing before continuing the execution
# of your script
wait

# Now clean up the split files
rm -f my_big.grib_*