Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The outline of this page is :

1) Problem description

2) Program flow

3) Test data file and caveats


Data date of  predefined data set is: 2019-10-15 till 2019-10-17

1) Description


The WIGOS id contains four parts such as 0-2XXXX-0-YYYYY, 

...

old stations and their  WIGOS ids.


2)Program description

Code Block
languagepy
'''
Created on 22 Oct 2019


# Copyright 2005-2018 ECMWF.
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction
   
This is a test program to encode Wigos Synop
requires
   
1) ecCodes version 2.814.1 or above (available at https://confluence.ecmwf.int/display/ECC/Releases)
2) python3.6.8-01
   
To run the program
   
-i <input bufr >./addWigosProg.py  -m <mode [web|json]>  -l <logFile>  -o <output BUFR file>i synop_multi_subset.bufr -o out_synop_multisubset.bufr  -w WIGOS_TEMP_IDENT.csv
   
      
Uses BUFR version 4 template  and adds the WIGOS Identifier 301150
REQUIRES TablesVersionNumber above 28
   
Author : Roberto Ribas Garcia ECMWF 28/10/2019

Modifications
    Addedperformance copy_headerimprovement function( touses keepskipExtraKeyAttributes) the header keys from the input message


'''


from eccodes import *
import argparse 
import json 
import re 
import pandas as pd 
import numpy as np 
import logging 
import requests 
import os 

def read_cmd_line():
    p=argparse.ArgumentParser()
    p.add_argument("-i","--input",help="input bufr file")
    and codes_clone   04/11/2019
    changes for SYNOP and TEMP messages                                       05/11/2019
    fixed codes_clone issue                                                   05/11/2019

'''
from eccodes import *
import argparse 
import json 
import re 
import pandas as pd 
import numpy as np 
import logging 
import requests 
import os 

def read_cmd_line():
    p=argparse.ArgumentParser()
    p.add_argument("-i","--input",help="input bufr file")
    p.add_argument("-o","--output",help="output bufr file with wigos")
    p.add_argument("-m","--mode",choices=["web","json"],help=" wigos source [ json file or web ]")
    p.add_argument("-l","--logfile",help="log file ")
    args=p.parse_args()
    return args 
    
def read_oscar_json(jsonFile):
    with open(jsonFile,"r") as f:
        jtext=json.load(f)
    return jtext 

def read_oscar_web(oscarURL="https://oscar.wmo.int/surface/rest/api/search/station?"):
    r=requests.get(oscarURL)
    jtext=json.loads(r.text)
    return jtext 

def parse_json_into_dataframe(jtext):
    '''
    parses the JSON from the file wigosJsonFile
    filters the stations by wigosStationIdentifiers key in the dictionaries
    '''
    
    wigosStations=[]
    nowigosStations=[]
    for d in jtext:
        if "wigosStationIdentifiers" in d.keys():
            wigosStations.append(d)
        else:
            nowigosStations.append(d)
    
    '''
    uses only the wigos 0-20XXX-0-YYYYY (surface)
    '''
    p=re.compile("0-20\d{3}-0-\d{5}")

    fwigosStations=[]
    for d in wigosStations:
        wigosInfo=d["wigosStationIdentifiers"]
        for e in wigosInfo:
            if e["primary"]==True:
                wigosId=e["wigosStationIdentifier"]
                if p.match(wigosId):
                    wigosParts=wigosId.split("-")
                    d["wigosIdentifierSeries"]=wigosParts[0]
                    d["wigosIssuerOfIdentifier"]=wigosParts[1]
                    d["wigosIssueNumber"]=wigosParts[2]
                    d["wigosLocalIdentifierCharacter"]=wigosParts[3]
                    d["oldID"]=wigosParts[3][-5:]
                    fwigosStations.append(d)
                    
    df=pd.DataFrame(fwigosStations)
    df=df[["longitude","latitude","name","wigosStationIdentifiers","wigosIdentifierSeries","wigosIssuerOfIdentifier","wigosIssueNumber",
           "wigosLocalIdentifierCharacter","oldID"]]  
    return df

def get_ident(bid):
    '''
    gets the ident of the message by combining blockNumber and stationNumber keys from the input BUFR file
    the ident may be single valued or multivalued ( only single valued are considered further)
    
    '''
    ident=None 
    if ( codes_is_defined(bid, "blockNumber") and codes_is_defined(bid,"stationNumber") ):
        blockNumber=codes_get_array(bid,"blockNumber")
        stationNumber=codes_get_array(bid,"stationNumber")
        if len(blockNumber)==1 and len(stationNumber)==1:
            ident="{0:02d}{1:03d}".format(int(blockNumber),int(stationNumber))
        elif len(blockNumber)==1 and len(stationNumber)!=1:
            blockNumber=np.repeat(blockNumber,len(stationNumber))
            ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) 
                   if b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG] 
        elif len(blockNumber)!=1 and len(stationNumber)!=1:
            ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) 
                   if b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG]
        '''
        here  
    return ident 

def copy_header(bid,obid):only the first element of the list is returned to the main program
    '''
    this function copiesavoids lists being used in the headerdataframe keysquery 
and breaking the  '''logic
    bhc=codes_get(bid,"bufrHeaderCentre")
    codes_set(obid,"bufrHeaderCentre",bhc)
'''
     bhsc=codes_get(bid,"bufrHeaderSubCentre")
   if codes_setisinstance(obid,"bufrHeaderSubCentre",bhsc)
ident,list):
      usn=codes_get(bid,"updateSequenceNumber")      ident=ident[0]
    codes_set(obid,"updateSequenceNumber",usn)return ident 


    dc=codes_get(bid,"dataCategory")

def add_wigos_info(ident,bid,odf,obid):
    codes_set(obid,"dataCategory",dc)
   
    dsc=codes_get(bid,"dataSubCategory")
    codes_set(obid,"dataSubCategory",dsc)
    year=codes_get(bid,"typicalYear")'''
    add the wigos information to the message ident pointed by bid
    the odf contains the WIGOS information for ident 
    codes_set(obid,"typicalYear",year)
    month=codes_get(bid,"typicalMonth") is the output handle
    codes_set(obid,"typicalMonth",month)'''
    day=codes_get(bid,"typicalDay")
    
    if codes_is_setdefined(obidbid, "typicalDayshortDelayedDescriptorReplicationFactor",day):
      hour  shortDelayed=codes_get_array(bid,"typicalHourshortDelayedDescriptorReplicationFactor")
    codes_set(obid,"typicalHour",hour)else:
        shortDelayed=None 

    tmin=if codes_is_getdefined(bid, "typicalMinutedelayedDescriptorReplicationFactor"):
    codes_set(obid,"typicalMinute",tmin)
     secdelayedDesc=codes_get_array(bid,"typicalSeconddelayedDescriptorReplicationFactor")
    codes_set(obid,"typicalSecond",sec)else:
    return 
   delayedDesc=None 
    

def add_wigos_info(ident,bid,wdf,obid):
    '''
    add the wigos information to the message ident pointed by bidif codes_is_defined(bid, "extendedDelayedDescriptorReplicationFactor"):
        extDelayedDesc=codes_get_array(bid,"extendedDelayedDescriptorReplicationFactor")
    theelse:
 wdf is the whole wigos dataframe and obid is the output bidextDelayedDesc=None 

    '''
    
    nsubsets=codes_get(bid,"numberOfSubsets")
    if compressed=codes_is_definedget(bid, "shortDelayedDescriptorReplicationFactorcompressedData"):
    
    shortDelayedmasterTablesVersionNumber=codes_get_array(bid,"shortDelayedDescriptorReplicationFactormasterTablesVersionNumber")
    elseif masterTablesVersionNumber<28:
        shortDelayed=None 
masterTablesVersionNumber=28
        
    if unexpandedDescriptors=codes_isget_definedarray(bid, "delayedDescriptorReplicationFactorunexpandedDescriptors"):
    outUD=list(unexpandedDescriptors)
    delayedDesc=codes_get_array(bid,"delayedDescriptorReplicationFactor")
 outUD.insert(0,301150)
   else:
     
   delayedDesc=None '''
    only treat the uncompressed 
messages with 1 subset 

    for future add treatment 
of compressed messages  nsubsets=codes_get(bid,"numberOfSubsets")
    compressed=codes_get(bid,"compressedData")with more than 1 subset
    '''
    masterTablesVersionNumber=codes_get(bid,"masterTablesVersionNumber")
    if masterTablesVersionNumber<28compressed==0 and nsubsets==1:
        masterTablesVersionNumber=28
'''
        IMPORTANT, takes 
into account delayed replications unexpandedDescriptors=codes_get_array(bid,"unexpandedDescriptors")( all possible cases) to accommodate
    outUD=list(unexpandedDescriptors)
    outUD.insert(0,301150)
    SYNOP + TEMP messages 
    
    '''
    only treat the uncompressed messagesif withshortDelayed 1is subsetnot None:
         for future add treatment of compressed messages with more than 1 subset
    ''' codes_set_array(obid,"inputShortDelayedDescriptorReplicationFactor",shortDelayed)
        if delayedDesc is not None:
    
     if compressed==0 and nsubsets==1: codes_set_array(obid,"inputDelayedDescriptorReplicationFactor",delayedDesc)
        if shortDelayedextDelayedDesc is not None:
            codes_set_array(obid,"inputShortDelayedDescriptorReplicationFactorinputExtendedDelayedDescriptorReplicationFactor",shortDelayedextDelayedDesc)
           if delayedDesc

 is not None:
     codes_set(obid,"masterTablesVersionNumber",masterTablesVersionNumber)
        codes_set_array(obid,"inputDelayedDescriptorReplicationFactornumberOfSubsets",delayedDescnsubsets)
        copy_header(bid,obid)
        
        codes_set_array(obid, "masterTablesVersionNumberunexpandedDescriptors",masterTablesVersionNumberoutUD)
        codes_set(obid,"numberOfSubsets",nsubsets)
wis=odf["wigosIdentifierSeries"].values 
        if odf=wdf.query("oldID=='{0}'".format(ident))len(wis)!=1:
        if not odf.empty:
  wis=wis[0]
          codes_set_array(obid, "unexpandedDescriptorswigosIdentifierSeries",outUDint(wis))
            wiswid=odf["wigosIdentifierSerieswigosIssuerOfIdentifier"].values 
            if len(wiswid)!=1:
                wis=wiswid=wid[0]
            codes_set(obid,"wigosIdentifierSerieswigosIssuerOfIdentifier",int(wiswid))
            widwin=odf["wigosIssuerOfIdentifierwigosIssueNumber"].values 
            if len(widwin)!=1:
                wid=widwin=win[0]
            codes_set(obid,"wigosIssuerOfIdentifierwigosIssueNumber",int(widwin))
            win=odf["wigosIssueNumber"].values 
            if len(win)!=1:wlid=odf["wigosLocalIdentifierCharacter"].values 
        wlid="{0:5}".format(wlid[0])
        win=win[0]
    logging.info(" wlid here {0}".format(wlid))
        codes_set(obid,"wigosIssueNumberwigosLocalIdentifierCharacter",intstr(winwlid))
        codes_bufr_copy_data(bid,obid)
    else:
       
 logging.info(" skipping compressed  message id {0} with {1} subsets  wlid=odf["wigosLocalIdentifierCharacter"].values 
.format(ident,nsubsets))
    
    return 
    wlid="{0:5}".format(wlid[0])
     

def main():
      logging.infoprint("ecCodes wlid hereversion {0}".format(wlidcodes_get_api_version()))
    args=read_cmd_line()
    logfile=args.logfile 
    codes_set(obid,"wigosLocalIdentifierCharacter",str(wlid))logging.basicConfig(filename=logfile,level=logging.INFO,filemode="w")
    
    infile=args.input 
    
    outfile=args.output 
   codes_bufr_copy_data(bid,obid)
    mode=args.mode 
     elseif mode=="web":
        jtext=read_oscar_web()
      logging.info(" wigos {0} is empty for ident {1}".format(ident,odf["wigosLocalIdentifierCharacter"].values))
    else:
        logging.info(" skipping compressed  message id {0} with {1} subsets ".format(ident,nsubsets))
    
    return obid
    
     

def main():
    args=read_cmd_line()
    logfile=args.logfile 
    logging.basicConfig(filename=logfile,level=logging.INFO,filemode="w")
    
    infile=args.input 
    
    outfile=args.output 
   
    mode=args.mode 
    if mode=="web":
        jtext=read_oscar_web()
        cdirectory=os.getcwd()
        oscarFile=os.path.join(cdirectory,"oscar.json")
        with open(oscarFile,"w") as f:
            json.dump(jtext,f)
    else  cdirectory=os.getcwd()
        oscarFile=os.path.join(cdirectory,"oscar.json")
        with open(oscarFile,"w") as f:
            json.dump(jtext,f)
    else:
        cdirectory=os.getcwd()
        oscarFile=os.path.join(cdirectory,"oscar.json")
        with open(oscarFile,"r") as f:
            jtext=json.load(f)
           
       
        
    wigosDf=parse_json_into_dataframe(jtext)
    
    f=open(infile,"rb")
    nmsg=codes_count_in_file(f)
    fout=open(outfile,"wb")
    for i in range(0,nmsg):
        cdirectory=os.getcwd(bid=codes_bufr_new_from_file(f)
        oscarFile=os.path.join(cdirectory,"oscar.json"obid=codes_clone(bid)
        with open(oscarFile,"r") as f:codes_set(bid, 'skipExtraKeyAttributes', 1)
        codes_set(bid,"unpack",1)
    jtext=json.load(f    ident=get_ident(bid)
       
    
    if   ident:
        
    wigosDf=parse_json_into_dataframe(jtext)
    
    f=open(infile,"rb")
    nmsg=codes_count_in_file(f)
    fout=open(outfile,"wb")
    for i in range(0,nmsg):
 logging.info (" \t message {0} ident {1} ".format(i+1,ident))

            odf=wigosDf.query("oldID=='{0}'".format(ident))                obid=codes_bufr_new_from_samples("BUFR4")
        bid=codes_bufr_new_from_file(f)    if not odf.empty:
        codes_set(bid,"unpack",1)
        ident=get_ident(bidadd_wigos_info(ident,bid, odf,obid)
        if ident:
       codes_write(obid,fout)
     logging.info (" \t message {0} ident {1} ".format(i+1,ident))
   else:
          add_wigos_info(ident,bid, wigosDf, obid)
    logging.info(" wigos {0} is empty for   codes_write(obid,foutident {1}".format(ident,odf["wigosLocalIdentifierCharacter"].values))
    
        else:
            logging.info ("message {0} rejected ".format(i+1))
        codes_release(obid)        
        codes_release(bid)
    f.close()    
   
    print (" finished")


if __name__ == '__main__':
    main()


The program can be called with the following arguments

...

that are uncompressed ( compressed =0) and single subset ( numberOfSubsets=1) if their ident matches the ones in wigosDf.

5) a new function ( copy_header) was added to avoid changing the header of the message. Now, it copies the keys from bid to obid except  typicalDate which is read onlyIf  get_ident function founds many idents on a message only returns the first one.


During program execution a log  file is generated containing information about the processing.

...

  • Only uncompressed messages  (compressed =0) and  single subset (numberOfSubsets=1) are considered
  • The Oscar information retrieved from the web server has to be cleared for this program to work. This is the goal of the function parse_json_into_dataframe that uses regular expressions to filter out the WIGOS data.
  • When setting the WIGOS information It is important to preserve the data types , for example "wigosLocalIdentifierCharacter" is a character string. 
  • The masterTablesVersionNumber must be above 28 otherwise no WIGOS ids can be added. This is done in the add_wigos_info function that updates the table version number key for each message processed.


Results


The output file contains 22724 messages19543  SYNOP messages obtained from running the program on a input BUFR file containing raw SYNOP data received through GTS




View file
nameout_synop_wigos.bufr
height250

This file contains 7 TEMP messages obtained running the program on a BUFR file containing raw TEMP messages.

View file
namenewOutputout_temp_wigos.bufr
height250