Heikki Järvinen, Sami Saarinen, Per Undén

  • Please note: Observation Data Selection was previously known as "blacklisting". The terminology in this user guide was updated in line with policies to avoid unconscious bias.

It describes the observation data selection language as well as its usage in IFS

1 Introduction


In the operational suite on Cray computer, the denylist was basically a list of undesired stations to be excluded from the analysis in operations, and usually in research experiments, too, based on monthly monitoring by the Operations Department. The technique for observation data selection has been streamlined as a part of the migration of operational codes from Cray to Fujitsu.

A new observation data selection format has been introduced that allows a great deal more flexibility in decision making on the use of observations. The observation data selection process now consists of two parts: data selection part and monthly monitoring part. Data selection part contains information about which variables will be used in the assimilation, and it should be amended only rarely, except in experimentation. The monthly monitoring part, on the other hand, will be updated fairly frequently as a result of data monitoring. The former automatic ship denylist is not supported any more.

This guide comprehensively describes the format of the observation data selection language developed at ECMWF during the migration project in 1995-96 based on initial idea by Mats Hamrud.

2 The observation data selection language


The way the observation data selection now works in the IFS context is as follows. One edits a data selection file which is written in a specific format. That file is then converted into a subroutine (C language) using the observation data selection compiler. The subroutine is then compiled and linked into the executable. This external routine is called from the IFS with a list of arguments in the observation screening run. IFS then receives a few flags telling whether to reject or accept this station or variable for assimilation. The following example will clarify the concepts used in the denylist.


if (OBSTYP = synop) then
if VARIAB in (u10m, v10m)
and LSMASK = land
and abs(LAT) < 25 then
fail(constant);
endif
endif;


There are several patterns in this single denylist rule and in the following they will be called:


  • variables, like OBSTYP, VARIAB, LSMASK, LAT (see 2.1)
  • keywords, like synop, u10m, v10m, land, constant (see 2.2)
  • statements, like if-then-elif-endif-block (see 2.3.1)
  • operators, like and, in, =, < (see 2.3.2)
  • built-in functions, like abs (see 2.4)
  • actions, like fail (see 2.5)


Variables get their values from IFS. These are compared against the keywords or values given in the denylist. If the denylist rule is true, fail-function takes action activating denial flags and returning back to the calling routine in IFS. Note that the observation data selection language is case insensitive and no column orientation is required.


2.1 Variables

A list of variables that are currently defined in IFS is given below. Adding new variables, see for 5.3.
 

2.1.1 Report characteristics

The up to date list of variables related to observation header and model fields can be found on HPCF in the external file of our denylist (for instance /home/rd/rdx/data/37r3/an/external_bl_mon_monit.b for CY37R3).

 Variable Meaning Possible values
 obstyp observation type Keyword (as listed below)
 statid station id Right justified 8 character string
 codtyp code type Integer value as defined in IFS
 instrm instrument type Integer value as defined in IFS
 date date Packed integer YYMMDD
 time time Packed integer HHMMSS
 lat latitude Real value in degrees (-90<=LAT<=90)
 lon longitude Real value in degrees (-180<LON<=180)
 stalt station altitude Real value in metres
 line_sat line position atovs Integer value
 retr_type retrieval type Integer value
 qi_fc EUMETSAT Quality Indicators: with forecast dependence
 rff CIMSS Quality Indicator: Recursive Filter Flag
 qi_nofc EUMETSAT Quality Indicators: without forecast dependence
 sensor satellite sensor indicator (for RTTOV) Integer value
 fov field of view number Integer value
 satza satellite zenith angle Real value (in degrees)
 nandat analysis date Packed integer YYMMDD
 nantim analysis time Packed integer HHMMSS
 soe solar elevation Real value
 qr quality of retrieval
 clc cloud cover
 cp cloud top pressure
 pt product type Integer value
 sonde_type sonde type Integer value
 specific amsua=clwp on sea
 gen_centre Generating centre Integer value (WMO defined)
 gen_subcentre Generating sub-centre Integer value (WMO defined)
 datastream Data stream (see datastream in odb) Integer value
 ifs_cycle six digit IFS-cycle f.ex 331001 for CY33R1.001 6 digit integer value
 retrsource retrieval source Integer value
 surftype surface type indicator
 sza solar zenith angle Real Value
 reportype MARS reportype Integer value for MARS archiving
 solar_hour solar hour Real value
 satellite_identifier satellite identifier Integer value
station_identifierstation identifier (for some conventional only)Integer value (similar to statid but for integer values only)


2.1.2 Model/first guess characteristics

 Variable Meaning Possible values
 modps model surface pressure Real Value
 modts model surface temperature Real Value
 modt2m model 2 metre temperature Real Value
 modtop  model top level pressure (hPa) Real Value
 sea_ice model sea-ice fraction  Real Value


2.1.3 Observation characteristics

External variables (SPECIAL, i.e. related to obs. body entry only)

 Variable Meaning Possible values
 variab variable name (varno in ODB)  Integer value
 vert_co  type of vert. coord.  Integer value
 press pressure (hPa)  Real value
 press_rl ref. level press. (hPa)  Real value
ppcode  synop press. code  Integer value
 obs_value observed value  Real value
 fg_departure  first guess depart.  Real value
 obs_error observation error  Real value
 fg_error first guess error  Real value
 winchan_dep window chan dep  Real value
 obs_t Obs temperature at same level, for R/S only.  Real value
 elevation Radar elevation  Real value
 winchan_dep2 alternative window chan dep  Real value
 tausfc Surface transmittance for AIRS screening.  Real value
 csr_pclear  percentage of clear pixel (GEOS) Real value


2.2 Keywords


Keywords are fixed values against which certain variables are compared. They should be consistent with the IFS definitions. A list of keywords that are currently defined in the denylist (in the external file of our denylist). Adding new keywords is straightforward.

 Variable Keyword
 OBSTYP

 synop, airep, satob, dribu, temp, pilot, satem, paob, scatt, limb, gbrad

(or integer values as defined in IFS)

 CODTYP

rtovs, tovs, ssmi, meris, am_profiler, jp_profiler, eu_profiler, templand, tempship, dropsonde, reo3, metar, pgps, radar_rr, rad1c, satem500, satem250

(or integer values as defined in IFS)

 SENSOR hirs, msu, ssu, amsua, amsub, ssmi_sensor, vtpr1, vtpr2, tmi, ssmis, airs, mhs, iasi, amsre, meteosat, msg, geosimg, mtsatimg, windsat, mwts, iras, mwri, envisat
 INSTRM mipas, gome, gomos, sciamachy, seviry, gome2, omi, toms, sbuv, auramls, iasi_reo3, modis_sensor, mopitt
 VARIAB u,v,z, z, dz, rh, q, pwc, rh2m, t, td, t2m, td2m, ts, ptend, w,ww, vv, ch, cm, cl, nh, nn, hshs, c, ns, s, e, tgtg, spsp1, spsp2, rs, eses, is, trtr, rr,jj,vs,ds, hwhw, dwdw, gclg, rhlc, rhmc, n, snra, ps, dd, ff, rawbt, rawra, satcl, scatss, du, dv, u10m, v10m, rhlay, auxil, cllqw, ambigv, ambigu, apdss, ro_bangk, rrefl, o3, hlos, no2, so2, co, hcho, go3, co2, ch4, aod, rao, od, rfltnc, lnprc
 LSMASK sea, land
 RLMASKtovsland
 PPCODE psealev, pstalev, g850hpa, g700hpa, p500gpm, p1000gpm, p2000gpm, p3000gpm, p4000gpm, g900hpa, g500hpa
 VERT_CO pressure, height, tovs_cha, sca
 RETR_TYP for TOVS cloudy, partly_cloudy, clear
 RETR_TYP for Satob wvcl, ir, vis, wv, comb_spec_channels, wvmw, wvcl1, wvcl2, wvcl3, ir1, ir2, ir3, vis1, vis2, vis3, wvmw1, wvmw2, wvmw3
 SONDE_TYPE for radiosondes st_avk_mrz, st_rs80_usa, st_rs80, st_rs90, st_viz
 DATASTREAM ears, pacrars, dbmodis
 ODB constants rmdi, ndmi (real values as defined in ODB)



 








2.3 Statements and operators


2.3.1 IF-statement syntax


The IF-statement syntax (note the semicolon (;) after each statement):

 Syntax Meaning


 if (condition) then
      statement_1;
      statement_2;
      etc.

 elif (condition) then
   statement_1;
   statement_2;
   etc.

 else
   statement_1;
   statement_2;
   etc.

endif 

IF-test with optional ELIF/ELSE-blocks.

Nested IF-tests are valid in every statement. Every IF-THEN or IF-THEN-ELSE must match an ENDIF

Condition can be any logical or arithmetic operation.

2.3.2 List of the simple operators


A list of operators that are currently defined in the observation data selection-language:

 



2.3.3 List of more complex operators

Somewhat more complex operators can also be used to simplify coding. For example the compound AND-operators below:



2.4 Built-in functions


The observation data selection-language also contains some built-in functions. They are listed below:



 



In addition, there is one special function to study whether a point is within a circular area on the Earth (e.g. to deny Meteosat SATOBs if they are too far away):

if (not (rad (0, 0, 45, LAT, LON))) then fail(monthly); endif;

The function is called rad() and requires five (5) arguments. It returns one (1) if the observation is within the circle, otherwise zero (0). The usage is

rad(reflat, reflon, refdeg, LAT, LON)

where the refdeg is radius of the circle on the Earth with the (reflat, reflon) as a center point of the circle. The (LAT, LON) is the position of the observation to be checked, i.e. LAT and LON of the report. All values are given in degrees. See also picture 2.1.

 

  
Figure 2.1: Schematic view of the rad()-function parameters.


The following arithmetic is performed in the function rad():

  1. Convert all degrees to radians
  2. Calculate angle distance (in radians) relative to the center point
    obsdeg = acos( cos(reflat) cos(LAT) cos(LON-reflon) + 
    sin(reflat) sin(LAT) )
  3. Return one from rad, if obsdeg ≤ refdeg, otherwise zero.

2.5 Actions


Finally, perhaps the most important function fail(). It returns information back to the application.

The fail()-function is a variable number argument function. If no arguments are given, the first argument is assumed to contain keyword monthly, i.e. rejection occurs in the monthly monitoring part of the data selection file. If the second argument -- seriousness of the denial -- is omitted, then seriousness is assumed to be equal to one.

Arguments in the fail(arg1, arg2)-function are:

 Argument#1 (arg1)
 Meaning
 monthly monthly monitoring (default)
 constant constant denial
 experimental experimental denial
use_emiskf_only emiskf denial
 Argument#2 (arg2)
 Meaning
 level

 Level of seriousness of denial
Range is between [0..1]. Default =1

When a call to the fail()-function occurs, the control is returned immediately to the calling application. Normally the application is the IFS, which will get the following (Fortran) variables updated:

 Variable Type Meaning
 NCMBLI Integer

 Denial indicator

0= not denied (default)
1= monthly monitoring
2= constant monitoring
3=experimental
4= use for emiskf only 

 ZCMCCC Real

 Seriousness of the denial

0= Default if not denied
1= Default if denied (i.e. NCMBLI > 0)
[0.01...0.99] for non-complete denial (optional)

 FEEDBACKInteger 

 Feedback vector telling which variable(s) caused the denial to occur:

0 = denylist line number where the fail()-function took action
1-N = Pointers to the variable indices to help to locate the responsible variables

There is a range of values for ZCMCCC, and together with other information in the quality control, and a value less than one may still lead to the use of this variable in the assimilation. The inclusion of this option of non-strict data denial increases flexibility of the use of observations.

2.6 Variable declaration

Variable declaration has to be performed, if data will be passed from an application (like IFS) into the denylist. This is normally done through external-declaration (see for 4.2 or 5.1). Also, selected variables can be protected by defining them as constants.

Additional or local variables can be defined everywhere in the code, even within the IF-THEN-ELSE-ENDIF -block (except in IF-condition). However, any attempt to use undeclared or uninitialized variables will cause the data selection -compilation to fail.

The simplest variable declaration is an assignment operation.



3 Operational and experimental use of the denylist

3.1 Location of data selection files

3.2 Some guidelines

Please do not place any station identifiers into the data selection part of the denylist. Instead, have them in the monthly monitoring part. By this way we can have as few changes as possible in the data selection part and make e.g. re-analysis much easier.

After any modifications to the denylist, please remember to recompile (preferably on a workstation) to check for syntax errors.

4 Creating new data selection file

Data selection compilation is fully controlled by the script called blcomp. It has the following capabilities:

  • Optionally convert from an old ASCII denylist format to a new format


  • Check the syntax of a given denylist


  • Create C-language file ( C_code.c) catered for observation processing


  • C-compile the C-file to create linkable object

4.1 Usage of the blcomp

The blcomp-script has the following usage:

blcomp [-aAcCdDefiILmMnoOpSx8] data_selection_file.b (or data_selection_file.B)

where the flags are as follows:

 



The new DENYLIST-file must have either suffix ".b" or ".B". In the latter case the C-preprocessor /lib/cpp will be run in the front of BL-compiler mainly to resolve any possible #include-statements.

For pure syntax checking of the new DENYLIST-file, give:

blcomp data_selection_file.b
or
blcomp data_selection_file.B

By giving blcomp without arguments you will get the usage. If you fail to do this, check for your setting of the PATH-environment variable.

4.2 Conversion from old to new denylist


Conversion from old to new and syntax checking of the new DENYLIST-file can be accomplished in the following way:

blcomp -o old_text_data_selection_file newfile.b
or
blcomp -o old_text_data_selection_file newfile.B


Here, the input file is old_text_data_selection_file, and output file is newfile.b (or newfile.B) in the new denylist format.

While converting from old to new format, the used suffix .b or .B of the new data selection file plays an important role. First of all, there MUST always be one suffix. When the suffix is .b, then a single data selection file (here: newfile.b) will be created with all external (e.g. variable declarations) and monthly monitoring rules (a portion of data selection that normally does not change during one month period) inlined.

If the suffix .B was used, then the following three (3) files are generated:

  • master file ( newfile.B)
  • include-file no. 1 for externals ( external_newfile.b)
  • include-file no. 2 for monthly part ( monthly_newfile.b)

The contents of the master file is simply the following two lines:

#include "external_newfile.b"
#include "monthly_newfile.b"

One way to bring in your own modifications, is to create a new master-file, for example:

#include "external_newfile.b"
#include "my_own_file"
#include "monthly_newfile.b"


This is exactly how the data selection part comes in in the production run, where instead of my_own_file is data selection part.

4.3 C-code generation

Enabling fast denylist handling the data selection file is always converted into an object file ( .o) meant to be linked with the (Fortran-)application (like IFS) in conjunction with the observation data selection object library (normally libbl95.a).

Once a data selection file (either with .b or .B suffix) is available, it can be converted to C-language file C_code.c and compiled to an object for maximum performance. This can be done as follows:

blcomp -c data_selection_file.b
or
blcomp -c data_selection_file.B


4.4 Linking with an application

A Fortran-application (IFS) interfaces the observation data selection via two subroutines:

  • BLACKBOX_INIT
  • BLACKBOX

The former one is responsible for initiating the variable list active by the application. And the latter one handles all burden of interfacing the data selection file.

To link application with the observation data selection software, one needs not only the C_code.o-object file, but also the data selection library libbl95.a. Linking command is normally:

linker application.o C_code.o /bl95path/libbl95.a other_libs


The exact location of the observation data selection library can be found via command:

blcomp -L

4.5 Combining conversion and object generation

If no data selection part is needed, one can combine conversion from old to new denylist and object code generation described above:

blcomp -c -o old_text_data_selection_file newfile.b  
or
blcomp -c -o old_text_data_selection_file newfile.B

4.6 User interface

It is always recommended to (cold-)compile a modified denylist on a workstation to check for syntax errors. If any errors are detected, the blcomp-command attempts to open an editor session and jump directly to the line where the (first) error occurred.

Sometimes this facility is not desirable and can be disabled by using -i flag in the blcomp-command.

5 Examples

The data selection file is normally about 1 000 lines long. In order not to confuse readers, we will explain here with very short examples what can be done with observation data selection language

5.1 A simple example

A fraction of an old denylist ( old) looks like as follows:

     3ELC  1 3
ELBX3 1 333
N503US 2 00030
UAL... 2 00030
024 3 33000000 033333
0// 3 33000000 033333
46527 4 33300
ERES 5 000003
08221 6 0330
201 7 33300000 00333


When compiled with blcomp -o old new.b, we get a new file new.b. The local constant variable declaration section looks as follows:

!
! Written by an automatic conversion program, version 3
!
!
! File converted from the file "old"
!

! FAILCODE :
const monthly = 1;
const constant = 2;
const experimental = 3;
const allowlist = 4;

! OBSTYP :
const synop = 1;
const airep = 2;
const satob = 3;
const dribu = 4;
const temp = 5;
const pilot = 6;
const satem = 7;
const paob = 8;
const scatt = 9;

! CODTYP : none

! INSTRM : none

! VARIAB :
const u = 3;
const v = 4;
const z = 1;
const dz = 57;
const rh = 29;
const q = 7;
const pwc = 9;
const rh2m = 58;
const t = 2;
const td = 59;
const t2m = 39;
const td2m = 40;
const ts = 11;
const ptend = 30;
const w = 60;
const ww = 61;
const vv = 62;
const ch = 63;
const cm = 64;
const cl = 65;
const nh = 66;
const nn = 67;
const hshs = 68;
const c = 69;
const ns = 70;
const s = 71;
const e = 72;
const tgtg = 73;
const spsp1 = 74;
const spsp2 = 75;
const rs = 76;
const eses = 77;
const is = 78;
const trtr = 79;
const rr = 80;
const jj = 81;
const vs = 82;
const ds = 83;
const hwhw = 84;
const pwpw = 85;
const dwdw = 86;
const gclg = 87;
const rhlc = 88;
const rhmc = 89;
const rhhc = 90;
const n = 91;
const snra = 92;
const ps = 110;
const dd = 111;
const ff = 112;
const rawbt = 119;
const rawra = 120;
const satcl = 121;
const scatss = 122;
const du = 5;
const dv = 6;
const u10m = 41;
const v10m = 42;
const rhlay = 19;
const auxil = 200;
const cllqw = 123;
const scatdd = 124;
const scatff = 125;

! LSMASK :
const sea = 0;
const land = 1;

! PPCODE :
const psealev = 0;
const pstalev = 1;
const g850hpa = 2;
const g700hpa = 3;
const p500gpm = 4;
const p1000gpm = 5;
const p2000gpm = 6;
const p3000gpm = 7;
const p4000gpm = 8;
const g900hpa = 9;
const g1000hpa = 10;
const g500hpa = 11;

! VERT_CO:
const pressure = 1;
const height = 2;
const tovs_cha = 3;
const scat_cha = 4;


The external variable definition section looks as follows:

! External variables (non-special):
external obstyp;
external_CHAR statid;
external codtyp;
external instrm;
external date;
external time;
external lat;
external lon;
external stalt;
external modoro;
external lsmask;
external rad;

! External variables (SPECIAL):
external variab is SPECIAL;
external vert_co is SPECIAL;
external press is SPECIAL;
external press_rl is SPECIAL;
external ppcode is SPECIAL;
external obs_value is SPECIAL;
external obs_departure is SPECIAL;
external modps is SPECIAL;


And finally the actual monthly monitoring rules in a new denylist format:

if ( OBSTYP = synop ) then
if VARIAB in ( z, ps )
and STATID = " 3ELC"
then fail(); endif;

if VARIAB in ( z, ps, u10m, v10m )
and STATID = " ELBX3"
then fail(); endif;

return; endif;

if ( OBSTYP = airep ) then
if (VARIAB = t)
and STATID in ( " N503US", " UAL...")
then fail(); endif;

return; endif;

if ( OBSTYP = satob ) then
if STATID in ( " 0//", " 024")
then fail(); endif;

return; endif;

if ( OBSTYP = dribu ) then
if VARIAB in ( z, ps, u, v )
and STATID = " 46527"
then fail(); endif;

return; endif;

if ( OBSTYP = temp ) then
if (VARIAB = z)
and STATID = " ERES"
then fail(); endif;

return; endif;

if ( OBSTYP = pilot ) then
if VARIAB in ( u, v )
and STATID = " 08221"
then fail(); endif;

return; endif;

if ( OBSTYP = satem ) then
if STATID = " 201"
then fail(); endif;

return; endif;

5.2 A more complex example

The observation data selection compiler will generate quite a compact and readable code from the following excerpt:


     ATQM  1 3
ATRK 1 3
ATSR 1 3
C6BB 1 3
C6QK 1 3
AN... 2 33333 50 10
NWA74 2 33333 -90 90 -40 -80
035 3 33000000 033333 -50 50 -50 50 1000 401
104 3 33000000 033333 -50 50 90 -170
20674 5 000003 100 10 11 13
40179 5 033000 05 07
40179 6 0330 05 07


The constant definition is not different from the previous example. For the monthly monitoring rules in a new denylist format becomes:


if ( OBSTYP = synop ) then
if VARIAB in ( z, ps )
and STATID in ( " ATQM", " ATRK", " ATSR", " C6BB", " C6QK")
then fail(); endif;

return; endif;

if ( OBSTYP = airep ) then
if ( 50 >= PRESS >= 10 )
and STATID = " AN..."
then fail(); endif;

if ( ( LAT < -90 or LAT > 90 ) or ( -80 < LON < -40 ) )
and STATID = " NWA74"
then fail(); endif;

return; endif;

if ( OBSTYP = satob ) then
if ( ( LAT < -50 or LAT > 50 ) or ( -170 < LON < 90 ) )
and STATID = " 104"
then fail(); endif;

if ( ( LAT < -50 or LAT > 50 ) or ( LON < -50 or LON > 50 ) )
and ( 1000 >= PRESS >= 401 )
and STATID = " 035"
then fail(); endif;

return; endif;

if ( OBSTYP = temp ) then
if (VARIAB = z)
and ( 100 >= PRESS >= 10 )
and ( 110000 <= TIME <= 130000 )
and STATID = " 20674"
then fail(); endif;

if VARIAB in ( u, v )
and ( 50000 <= TIME <= 70000 )
and STATID = " 40179"
then fail(); endif;

return; endif;

if ( OBSTYP = pilot ) then
if VARIAB in ( u, v )
and ( 50000 <= TIME <= 70000 )
and STATID = " 40179"
then fail(); endif;

return; endif;



5.3 Adding completely new variable to the system


The current definition of variables can be checked from IFS source code in obs_preproc/blinit.F90. Adding new variables requires:

  1. Never remove or redifine existing variables. That will make re-running earlier cases virtually impossible.
  2. Add the new variable in the SQL requests black_rob*.sql. If the new variable is not in hdr or body but in some data-specific tables (e.g. sat, or conv), you need to modify *only* those requests that are relevant for those data and have access to these tables.
  3. Add a variable to the IFS source code in obs_preproc/blinit.F90.
  4. Increase the number of defined variables in obs_preproc/blinit.F90.
  5. External declaration must be done into the external-file.
  6. Before starting to use the new variable, initialize it properly in obs_preproc/black.F90. If the new variable is not in hdr or body but in some data-specific tables (e.g. sat, or conv):
    • make sure the variable is always initialized, and
    • put some logic in place (e.g. IF (IOBTYP == NSYNOP)...) in order to populate, only when appropriate, the variable with values from the sql.
  7. The new variable can now be added into the denylist. If keywords are associated with, declare them in the external-file as well.