Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

As providers start to  add WIGOS ids to their data, the need to test how the NWP model copes with the WIGOS ids raisedThe motivation for this test case is to test how NWP models deal with WIGOS ids.

To this aim, a Python3 program has been created to add WIGOS ids to current SYNOP messages received at ECMWF.


The outline of this page is :

1) DescriptionProblem description

2) Program descriptionflow

3) Test data file and caveats


Data date of  predefined data set is: 2019-10-15 till 2019-10-17

1) Description


The WIGOS id contains four parts such as 0-2XXXX-0-YYYYY, 

wigosIdentifierSeriesIssuer of IdentifierIssue NumberLocalIdentifier
02XXXX0YYYYY


The OSCAR web REST  API interface was used to obtain a list REST  API interface ("https://oscar.wmo.int/surface/rest/api/search/station?)  was used to obtain a list of all the WIGOS Ids available at the moment ( ). 

From this information only the surface observations 0-20000-0-YYYYY were used.


The last part of the WIGOS id, ( local Identifier) matches the current BUFR message identifier ( concatenation  of blockNumber and stationNumber) and is used to do the mapping between

old stations and their their  WIGOS ids.


2)Program description

Code Block
languagepy
'''
Created on 22 Oct 2019


# Copyright 2005-2018 ECMWF.
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction
   
This is a test program to encode Wigos Synop
requires
   
1) ecCodes version 2.814.1 or above (available at https://confluence.ecmwf.int/display/ECC/Releases)
2) python2.7python3.6.8-01
   
To run the program
   
   ./wigosTempaddWigosProg.py  -i synop_multi_subset.bufr -o out_synop_multisubset.bufr  -w WIGOS_TEMP_IDENT.csv
      
Uses BUFR version 4 template  and adds the WIGOS Identifier 301150
REQUIRES TablesVersionNumber above 28
   
Author : Roberto Ribas Garcia ECMWF 1228/0910/2019

'''Modifications
from eccodes import *
import argparse 
import json 
import re 
import pandas as pd 
import numpy as np 
import logging 
import requests 
import os 

def read_cmd_line():
    p=argparse.ArgumentParser()
    p.add_argument("-i","--input",help="input bufr file")
    p.add_argument("-o","--output",help="output bufr file with wigos")
    p.add_argument("-m","--mode",choices=["web","json"],help=" wigos source [ json file or web ]")
    p.add_argument("-l","--logfile",help="log file ")
    args=p.parse_args()
    return args 
    
def read_oscar_json(jsonFile):
    with open(jsonFile,"r") as f:
performance improvement ( uses skipExtraKeyAttributes)  and codes_clone   04/11/2019
    changes for SYNOP and TEMP messages                                       05/11/2019
    fixed codes_clone issue                                             jtext=json.load(f)
    return jtext 05/11/2019

def read_oscar_web(oscarURL="https://oscar.wmo.int/surface/rest/api/search/station?"):
    r=requests.get(oscarURL)
    jtext=json.loads(r.text)
    return jtext'''
from eccodes import *
import argparse 
import json 
import re 
import pandas as pd 
import numpy as np 
import logging 
import requests 
import os 

def parseread_jsoncmd_into_dataframeline(jtext):
    '''p=argparse.ArgumentParser()
    parses the JSON from the file wigosJsonFilep.add_argument("-i","--input",help="input bufr file")
    filters the stations by wigosStationIdentifiers key in the dictionaries
    '''
    
    wigosStations=[]
    nowigosStations=[]
    for d in jtext:
        if "wigosStationIdentifiers" in d.keys():
            wigosStations.append(d)
        elsep.add_argument("-o","--output",help="output bufr file with wigos")
    p.add_argument("-m","--mode",choices=["web","json"],help=" wigos source [ json file or web ]")
    p.add_argument("-l","--logfile",help="log file ")
    args=p.parse_args()
    return args 
    
def read_oscar_json(jsonFile):
    with open(jsonFile,"r") as f:
            nowigosStations.append(djtext=json.load(f)
    
return jtext   '''

def read_oscar_web(oscarURL="https://oscar.wmo.int/surface/rest/api/search/station?"):
    uses only the wigos 0-20XXX-0-YYYYY (surface)
    '''
    p=re.compile("0-20\d{3}-0-\d{5}")

    fwigosStations=[]
    for d in wigosStations:r=requests.get(oscarURL)
    jtext=json.loads(r.text)
    return jtext 

def parse_json_into_dataframe(jtext):
    '''
    parses the JSON from the file wigosJsonFile
    filters the stations  wigosInfo=d["wigosStationIdentifiers"]
    by wigosStationIdentifiers key in the dictionaries
    for e in wigosInfo:'''
    
    wigosStations=[]
    if e["primary"]==True:nowigosStations=[]
    for d in jtext:
        if wigosId=e["wigosStationIdentifierwigosStationIdentifiers"]
 in d.keys():
              if p.match(wigosId):wigosStations.append(d)
        else:
            wigosParts=wigosIdnowigosStations.splitappend("-"d)
    
    '''
    uses only the wigos     d["wigosIdentifierSeries"]=wigosParts[0]0-20XXX-0-YYYYY (surface)
    '''
        p=re.compile("0-20\d{3}-0-\d{5}")

        d["wigosIssuerOfIdentifier"]=wigosParts[1fwigosStations=[]
    for d in wigosStations:
             wigosInfo=d["wigosIssueNumberwigosStationIdentifiers"]=wigosParts[2]
        for e in wigosInfo:
         d   if e["wigosLocalIdentifierCharacterprimary"]=wigosParts[3]=True:
                    dwigosId=e["oldIDwigosStationIdentifier"]=wigosParts[3][-5:]
                if    fwigosStations.append(d)p.match(wigosId):
                    wigosParts=wigosId.split("-")
        df=pd.DataFrame(fwigosStations)
    df=df[["longitude","latitude","name","wigosStationIdentifiers","wigosIdentifierSeries","wigosIssuerOfIdentifier","wigosIssueNumber",
           "wigosLocalIdentifierCharacter","oldID"]]  d["wigosIdentifierSeries"]=wigosParts[0]
    return df

def get_ident(bid):
    '''
    gets the ident of the message by combining blockNumber and stationNumber keys from the input BUFR file
 d["wigosIssuerOfIdentifier"]=wigosParts[1]
              the ident may be single valued or multivalued ( only single valued are considered further)
 d["wigosIssueNumber"]=wigosParts[2]
         '''
    ident=None 
    if ( codes_is_defined(bid, "blockNumber") and codes_is_defined(bid,"stationNumber") ):
 d["wigosLocalIdentifierCharacter"]=wigosParts[3]
                    blockNumber=codes_get_array(bid,"blockNumber")d["oldID"]=wigosParts[3][-5:]
        stationNumber=codes_get_array(bid,"stationNumber")
         if len(blockNumber)==1 and lenfwigosStations.append(stationNumberd)==1:
            ident="{0:02d}{1:03d}".format(int(blockNumber),int(stationNumber))
        
  elif len(blockNumber)==1 and len(stationNumber)!=1:  df=pd.DataFrame(fwigosStations)
            blockNumber=np.repeat(blockNumber,len(stationNumber))df=df[["longitude","latitude","name","wigosStationIdentifiers","wigosIdentifierSeries","wigosIssuerOfIdentifier","wigosIssueNumber",
            ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) "wigosLocalIdentifierCharacter","oldID"]]  
    return df

def get_ident(bid):
    '''
    gets the ident of the message by combining blockNumber and stationNumber if b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG] keys from the input BUFR file
    the ident may be single valued elifor multivalued len(blockNumber)!=1 and len(stationNumber)!=1:( only single valued are considered further)
    
    '''
    ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) None 
    if ( codes_is_defined(bid, "blockNumber") and codes_is_defined(bid,"stationNumber") ):
              blockNumber=codes_get_array(bid,"blockNumber")
     if b!=CODES_MISSING_LONG and s!stationNumber=CODEScodes_MISSING_LONG]get_array(bid,"stationNumber")
        if len(blockNumber)==1 and len(stationNumber)==1:
     
    return ident 

def add_wigos_info(ident,bid,wdf,obid): ident="{0:02d}{1:03d}".format(int(blockNumber),int(stationNumber))
    '''
    add the wigos information to the message ident pointed by bid
elif len(blockNumber)==1 and len(stationNumber)!=1:
           the wdf is the whole wigos dataframe and obid is the output bid
    '''
    
    
    if codes_is_defined(bid, "shortDelayedDescriptorReplicationFactor"):
        shortDelayed=codes_get_array(bid,"shortDelayedDescriptorReplicationFactor")
    else:
        shortDelayed=None 

    if codes_is_defined(bid, "delayedDescriptorReplicationFactor"):
        delayedDesc=codes_get_array(bid,"delayedDescriptorReplicationFactor")
    else: blockNumber=np.repeat(blockNumber,len(stationNumber))
            ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) 
                   if b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG] 
        elif len(blockNumber)!=1 and len(stationNumber)!=1:
            ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) 
        delayedDesc=None 
        
   if 

b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG]
        '''
    nsubsets=codes_get(bid,"numberOfSubsets")
    compressed=codes_get(bid,"compressedData")
    
    masterTablesVersionNumber=codes_get(bid,"masterTablesVersionNumber")
    if masterTablesVersionNumber<28:    here only the first element of the list is returned to the main program
        masterTablesVersionNumber=28
this avoids lists being used in the dataframe 
query and breaking  unexpandedDescriptors=codes_get_array(bid,"unexpandedDescriptors")the logic
    outUD=list(unexpandedDescriptors)
    outUD.insert(0,301150)'''
        
    '''if isinstance(ident,list):
    only treat the uncompressed messages with 1 subset ident=ident[0]
    forreturn futureident add


 treatment of compressed messages with more than 1 subset

def add_wigos_info(ident,bid,odf,obid):
    '''
    
add the wigos  if compressed==0 and nsubsets==1:information to the message ident pointed by bid
    the odf contains the ifWIGOS shortDelayedinformation isfor notident None:
    obid is the output handle
    codes_set_array(obid,"inputShortDelayedDescriptorReplicationFactor",shortDelayed)'''
   
    
 if delayedDesc is not Noneif codes_is_defined(bid, "shortDelayedDescriptorReplicationFactor"):
            shortDelayed=codes_setget_array(obidbid,"inputDelayedDescriptorReplicationFactorshortDelayedDescriptorReplicationFactor",delayedDesc)
    else:
        shortDelayed=None 

    if codes_is_setdefined(obidbid, "masterTablesVersionNumberdelayedDescriptorReplicationFactor",masterTablesVersionNumber):
        delayedDesc=codes_get_setarray(obidbid,"numberOfSubsetsdelayedDescriptorReplicationFactor",nsubsets)
    else:
    odf=wdf.query("oldID=='{0}'".format(ident))    delayedDesc=None 
        if
 not odf.empty:
  if codes_is_defined(bid, "extendedDelayedDescriptorReplicationFactor"):
        extDelayedDesc=codes_setget_array(obidbid, "unexpandedDescriptorsextendedDelayedDescriptorReplicationFactor",outUD)
    else:
        wis=odf["wigosIdentifierSeries"].valuesextDelayedDesc=None 

        
    if len(wis)!=1:nsubsets=codes_get(bid,"numberOfSubsets")
    compressed=codes_get(bid,"compressedData")
    
        wis=wis[0]
masterTablesVersionNumber=codes_get(bid,"masterTablesVersionNumber")
    if masterTablesVersionNumber<28:
        codes_set(obid,"wigosIdentifierSeries",int(wis))masterTablesVersionNumber=28
        
    wid=odf["wigosIssuerOfIdentifier"].values unexpandedDescriptors=codes_get_array(bid,"unexpandedDescriptors")
    outUD=list(unexpandedDescriptors)
        if len(wid)!=1:outUD.insert(0,301150)
        
    '''
    wid=wid[0]
            codes_set(obid,"wigosIssuerOfIdentifier",int(wid))
            win=odf["wigosIssueNumber"].values only treat the uncompressed messages with 1 subset 
    for future add treatment of compressed messages with more than 1 subset
    '''
    
    if len(win)!=1compressed==0 and nsubsets==1:
        '''
        win=win[0]
            codes_set(obid,"wigosIssueNumber",int(win))IMPORTANT, takes into account delayed replications ( all possible cases) to accommodate
        SYNOP + TEMP  messages 
        '''
    wlid=odf["wigosLocalIdentifierCharacter"].values 
   if shortDelayed is not None:
         wlid="{0:5}".format(wlid[0]   codes_set_array(obid,"inputShortDelayedDescriptorReplicationFactor",shortDelayed)
        if delayedDesc is  logging.info(" wlid here {0}".format(wlid))not None:
            codes_set_array(obid,"wigosLocalIdentifierCharacterinputDelayedDescriptorReplicationFactor",str(wlid))delayedDesc)
        if extDelayedDesc is not None:
            codes_bufrset_copy_data(bid,obidarray(obid,"inputExtendedDelayedDescriptorReplicationFactor",extDelayedDesc)
        else:
    

        logging.info(" wigos {0} is empty for ident {1}".format(ident,odf["wigosLocalIdentifierCharacter"].values)codes_set(obid,"masterTablesVersionNumber",masterTablesVersionNumber)
        codes_set(obid,"numberOfSubsets",nsubsets)
    else:    
        logging.info("
 skipping compressed  message id {0} with {1} subsets ".format(ident,nsubsets) codes_set_array(obid, "unexpandedDescriptors",outUD)
    
    return obid
 wis=odf["wigosIdentifierSeries"].values 
   
     

defif mainlen(wis)!=1:
    args=read_cmd_line()
    logfile=args.logfile 
    logging.basicConfig(filename=logfile,level=logging.INFO,filemode="w")wis=wis[0]
    
    infile=args.input codes_set(obid,"wigosIdentifierSeries",int(wis))
    
    outfile=args.outputwid=odf["wigosIssuerOfIdentifier"].values 
   
     if mode=args.mode len(wid)!=1:
    if mode=="web":
        jtext=read_oscar_web()wid=wid[0]
        cdirectory=os.getcwd()codes_set(obid,"wigosIssuerOfIdentifier",int(wid))
        oscarFile=os.path.join(cdirectory,"oscar.json")win=odf["wigosIssueNumber"].values 
        withif open(oscarFile,"w") as flen(win)!=1:
            json.dump(jtext,f)
 win=win[0]
   else:
     codes_set(obid,"wigosIssueNumber",int(win))   cdirectory=os.getcwd()
         oscarFile=os.path.join(cdirectory,"oscar.json")
        with open(oscarFile,"r") as f:wlid=odf["wigosLocalIdentifierCharacter"].values 
            jtext=json.load(fwlid="{0:5}".format(wlid[0])
        logging.info(" wlid here {0}".format(wlid))
        codes_set(obid,"wigosLocalIdentifierCharacter",str(wlid))
        
    wigosDf=parse_json_into_dataframe(jtextcodes_bufr_copy_data(bid,obid)
    else:
      f=open(infile,"rb")
    nmsg=codes_count_in_file(f)
    fout=open(outfile,"wb"  logging.info(" skipping compressed  message id {0} with {1} subsets ".format(ident,nsubsets))
    for
   i in range(0,nmsg):
return 
    
     obid=codes_bufr_new_from_samples("BUFR4")

def main():
    print("ecCodes    bid=version {0}".format(codes_bufrget_newapi_from_file(fversion()))
        codes_set(bid,"unpack",1args=read_cmd_line()
     logfile=args.logfile 
  ident=get_ident(bid  logging.basicConfig(filename=logfile,level=logging.INFO,filemode="w")
    
    if ident:infile=args.input 
    
    outfile=args.output 
   
    loggingmode=args.infomode ("
 \t message {0} ident {1} ".format(i+1,ident))if mode=="web":
            add_wigos_info(ident,bid, wigosDf, obidjtext=read_oscar_web()
            codes_write(obid,foutcdirectory=os.getcwd()
    
        else:oscarFile=os.path.join(cdirectory,"oscar.json")
            logging.info ("message {0} rejected ".format(i+1))with open(oscarFile,"w") as f:
        codes_release(obid)    json.dump(jtext,f)
    else:
        codes_release(bidcdirectory=os.getcwd()
     f.close()   oscarFile=os.path.join(cdirectory,"oscar.json")
    
   
 with   print (" finished")


if __name__ == '__main__':
    main()open(oscarFile,"r") as f:
            jtext=json.load(f)
           
       
        
    wigosDf=parse_json_into_dataframe(jtext)
    
    f=open(infile,"rb")
    nmsg=codes_count_in_file(f)
    fout=open(outfile,"wb")
    for i in range(0,nmsg):
        bid=codes_bufr_new_from_file(f)
        obid=codes_clone(bid)
        codes_set(bid, 'skipExtraKeyAttributes', 1)
        codes_set(bid,"unpack",1)
        ident=get_ident(bid)
       
        if ident:
            logging.info (" \t message {0} ident {1} ".format(i+1,ident))

            odf=wigosDf.query("oldID=='{0}'".format(ident))                
            if not odf.empty:
                add_wigos_info(ident,bid, odf,obid)
                codes_write(obid,fout)
            else:
                logging.info(" wigos {0} is empty for ident {1}".format(ident,odf["wigosLocalIdentifierCharacter"].values))
    
        else:
            logging.info ("message {0} rejected ".format(i+1))
        codes_release(obid)        
        codes_release(bid)
    f.close()    
   
    print (" finished")


if __name__ == '__main__':
    main()

The program can be called with the following arguments

-i    input BUFR file containing SYNOP messages without WIGOS ids

-o output BUFR file  that will contain the SYNOP messages with WIGOS Id.

-m  mode ( can be 'web' to allow the program connect to OSCAR server or 'json' to make the program use a JSON file containing the same information as the OSCAR server) this was done to speed up the development avoiding reloading the Oscar data from the web

-l log file to write the progress of the conversion


The program flow is the following

1) read the command line arguments

2) read the OSCAR information from web or JSON file and store it in a pandas DataFrame that will help in the  mapping. The two functions read_oscar_web and read_oscar_json return a JSON list of dictionaries

that are filtered to retain only the surface observations with issuer Number  20000( surface observations) Then a pandas dataframe is used to store this information and help in the querying of the database.

3) open the input BUFR file and read each individual message

4) for each message,  create the message identifier ( concatenation of blockNumber+stationNumber) and add the WIGOS information to the messages

that are uncompressed ( compressed =0) and single subset ( numberOfSubsets=1) if their ident matches the ones in wigosDf.

5) If  get_ident function founds many idents on a message only returns the first one.


During program execution a log  file is generated containing information about the processing.


At this point some caveats are needed

  • Only uncompressed messages  (compressed =0) and  single subset (numberOfSubsets=1) are considered
  • The Oscar information retrieved from the web server has to be cleared for this program to work. This is the goal of the function parse_json_into_dataframe that uses regular expressions to filter out the WIGOS data.
  • When setting the WIGOS information It is important to preserve the data types , for example "wigosLocalIdentifierCharacter" is a character string. 
  • The masterTablesVersionNumber must be above 28 otherwise no WIGOS ids can be added. This is done in the add_wigos_info function that updates the table version number key for each message processed.


Results


The output file contains 19543  SYNOP messages obtained from running the program on a input BUFR file containing raw SYNOP data received through GTS




View file
nameout_synop_wigos.bufr
height250

This file contains 7 TEMP messages obtained running the program on a BUFR file containing raw TEMP messages.

View file
nameout_temp_wigos.bufr
height250