Contributors: Peter Thorne (NUIM), Simon Noone (NUIM), Corinne Voces (NUIM), Fabio Madonna (CNR), Leo Haimberger (University of Wien), Gerard van der Schrier (KNMI), Marlies van der Schee (KNMI), Paul Poli (ECMWF), Hans Hersbach (ECMWF), Dave Berry (WMO)

Issued by: NUIM / Peter Thorne

Official reference number service contract: ECMWF/COPERNICUS/2021/C3S2_311_Lot1_NUIM

Table of Contents


History of modifications

Version

Author

Date

Notes

1.0

Peter Thorne

15/12/2023


1.1

Peter Thorne

17/01/2025

Correction of variable names for CDM-OBS consistency, clarification on how to handle lons and lats for mobile and profile observations

Introduction

The CDM-OBS-Core is derived from the multi-table based CDM-OBS data model developed by C3S-funded contracts, i.e., C3S 311a Lot 2 and continuously maintained since 2021 by C3S2 311 Lot1 on behalf of all in-situ lots. That CDM-OBS data model was built upon the table driven ODB-1 format employed at ECMWF and also heavily used the under development WIGOS metadata standards from WMO to populate many tables. 

While there remains considerable value in contracts retaining and using the full CDM-OBS in their internal data management viz sharing of data between contracts, the retention of all necessary metadata, and the formalization of data structures, the full CDM-OBS has proven to have issues with scaling for data service provision in particular and contains many elements which will be at best of marginal value to the vast majority of C3S users. 

The CDM-OBS-Core is a deliberate subset of the CDM-OBS intended to be stored and served as a single table data model to users via the new CADS platform. The underlying ethos is to minimize to the extent possible the number of fields to be stored and to ensure homogeneity of data provision while recognizing that some irreducible heterogeneity exists in the in-situ data being served that needs to be catered for appropriately. 

As such CDM-OBS-Core is part data model and part data format in that certain rules also pertain to which fields should be present in which order. This should facilitate the provision of modular reusable software and routines to support a seamless user experience across the in-situ services provided by C3S to end-users. For any given data collection (generally a catalogue entry) the exact CDM-OBS-Core profile employed must be explicitly documented and declared.

The in-situ C3S contracts will collectively maintain both CDM-OBS and CDM-OBS-Core moving forwards as a pair of linked data models. Specifically, the CDM-OBS-Core will always consist of a selected subset of and exclusively use the tables in the CDM-OBS to define variable codes etc. Changes in CDM-OBS code tables will be cascaded to CDM-OBS-Core as appropriate. 

The balance of this document is structured as follows:

  • Section 1 briefly recaps the CDM-OBS
  • Section 2 outlines the CDM-OBS-Core
    • Section 2.1 outlines mandatory elements that must be provided with every observed value in any collection profile
    • Section 2.2 outlines mandatory elements that can be included either via header elements or with every observed value in any collection profile
    • Section 2.3 provides optional elements which may be provided in any collection profile
  • Section 3 outlines how the profile for a given collection is to be declared.

1. CDM-OBS

The CDM-OBS was developed in the first round of C3S contracts and is maintained by C3S2 311 Lot 1 presently.

The main repository for the common data model can be found at: https://github.com/ecmwf-projects/cdm-obs.

Additional information, although pending to be updated, are provided in the form of (a) a PDF documentation and (b) a model representation.

The starting point for the CDM-OBS data model was the Observation DataBase (ODB) version 1 (ODBv1) data model developed by ECMWF (Saarinen, 2004: ODB User Guide). However, this has been significantly modified and expanded to include comprehensive provenance, discovery and instrumental metadata. Within the data model, the reports are split across two different record types following the ODBv1 approach. Header records provide information common to all observations within a given report. Linked records contain the observations, with a single measurand or observed variable per record. Additional linked records contain the metadata. Where possible, we have tried to keep the variable names and attributes used in existing standards. For example, at the time of initial development the code tables included in this data model were directly linked to those from the WIGOS Metadata Standard (WMDS) and BUFR but it should be noted that some of these may have since changed (particularly the WMDS). We have also tried to account for the need to extract both the observations and metadata into different standards, for example it should be possible to map the data contained in the data model to the ISO19135 and ISO19139 standards.

The overall CDM-OBS architecture is summarized in Figure 1 below. The CDM-OBS is a living data model that is updated via a management group with representation from: C3S2 311 contracts; ECMWF and the RODEO project at the present time. Minor updates can be agreed via written procedure whereas major changes require explicit agreement. The coordination group meets no less frequently than 3 times a year. It is proposed that the linked CDM-OBS-Core be managed by the same group in the same manner.


The CDM-OBS architecture

Figure 1: The CDM-OBS architecture. The figure shows the conceptual data model used by the CDM-OBS, largely building on the ODB v1 data model developed by ECMWF. Only the primary data tables are shown, there are additional data and code tables but these have been omitted for brevity. Within this data model each weather report (or record) is split across two main tables. The first, the header table, provides information common to all observations contained in the report. Examples include, inter alia, the location and time of the report, the identity of the station, events at the time of the report etc. The second table contains the observations, with one observation per row, and including metadata related to how the observation was made and sensor height. These two tables are then linked to further configuration tables providing more in depth information on the sensors, stations and sources. For those observations from profiles, e.g. radiosonde or ocean profile measurements, further information can be provided via the profile configuration table. Additional quality control, homogenization and uncertainty estimates can be provided through the respective tables linked to the observations table. Finally, optional information can be provided through the optional tables. 


2. CDM-OBS-Core

The CDM-OBS-Core consists of a subset of the profile of elements contained principally in the station configuration table, header table and observation table of the CDM-OBS. It consists of those elements deemed essential for the vast majority of users to understand and use the data appropriately. 

Recognizing that different data being served by different providers have distinct characteristics the CDM-OBS-Core consists of a core set of elements that must always be present and an optional set of elements which may additionally be provided as they are essential to some, but not all, collections. As such all data served under CDM-OBS-Core must include on a per observed value basis all of the 14 core elements (Section 2.1), must further include either on a per observation basis or as header information a further 6 variables (Section 2.2), and may include any selection of the 16 optional elements (with the three uncertainty elements able to be repeated several times, Section 2.3). The profile of CDM-OBS-core used (core + optional elements) must be clearly documented (Section 3).

2.1. Compulsory elements that must be present on a per observation basis

Compulsory elements must always be present on a per observation basis and must be presented in the order presented in Table 1 which follows a logical sequencing. All compulsory elements are essential to describe the data being presented and are expected to be provided on a per observation basis by all C3S in-situ data providers. The compulsory elements must be the first 14 elements in any CDM-OBS-Core compliant holdings and must always be present in the order given in Table 1 below. For descriptions of all elements please refer to the corresponding tables linked in the full CDM-OBS documentation available at https://glamod.github.io/cdm-obs-documentation/introduction.html [note this location will change].

Note that although some elements may be stored as e.g. integer for units, when presented to the user these fields should always be converted to human readable values e.g. Kelvin, mm etc. There may therefore be the need to deploy a converter between the CDM-OBS-Core data profile and users for a given holdings.

Such a solution of storing as integers can have significant benefits viz data volumes with substantial improvements in terms of database serving performance and also a non-negligible implication for both costs and GHG emissions associated with data storage and serving.

Table 1: Compulsory elements of the CDM-OBS-Core.

Element grouping

Element

Type

CDM-OBS primary Table

Description

Identifier information





Station_name

varchar

Station configuration

The station name (where station can mean a physical station, a ship, a buoy or any other observing platform)

Primary_id

varchar

Station configuration

The primary station identifier for the station / platform from which the observation arises. Where the station has one or more WIGOS Station Identifiers (WSIs) this should be the primary WSI associated

report_id

varchar

Header

Report identifier unique per report (collection of observations)

observation_id

varchar

Observations

Unique observation identifier per observation

Location information




Longitude

numeric

Header

Location of instrument at time of observation. This should be identical to entry in station configuration table for fixed assets. For mobile assets such as ships or balloon ascents it should be instantaneous.

Latitude

numeric

Header


Height_of_station_above_sea_level

numeric

Header


Temporal information




Report_timestamp

Timestamp with timezone

Header

Date timestamp including timezone. The default for presentation of data via C3S should be that all data have been converted to UTC. It should be the time at which the associated observation was taken.

Report_meaning_of_time_stamp

int

Header

Whether the timestamp refers to beginning, middle or end of reporting period

Report_duration

int

Header

The duration of the report

Observation value information




Observed_variable

int

Observations

The variable being observed defined by a numeric identifier

units

int

Observations

The units associated with the observed variable

Observation_value

numeric

Observations

The observed value

Quality information

Quality_flag

Int

Observations

The quality flag for the observation

2.1.1. Location information

For any observation to be usable at a minimum its geolocation is necessary to be ascertained. Three elements in CDM-OBS provide the necessary minimum common information which all observations must contain. 
Longitude and latitude are required to be given in the WGS84 coordinate system in CDM-OBS-Core to avoid carrying complex coordinate system information that would add complexity to the data model. If necessary the original coordinates should be converted to this system. The original units and the conversion can be fully documented in the commensurate CDM-OBS profile.

The height of station above sea level should be given in metres, with negative values for those below sea-level. Where the height of the station above sea level is unknown a missing identifier can be used.
The table in CDM-OBS for all three location elements is the header table although for fixed assets these should be identical to values also in the station configuration table. Where a fixed asset station has a WSI and a commensurate entry in OSCAR Surface the metadata should match those held in OSCAR Surface. Any mismatches should be investigated and rectified.

2.1.2. Temporal information

It is also necessary to know when an observation was made for it to be usable. Three elements present in the header table in CDM-OBS can provide the necessary minimum level of information required. 

Report_timestamp provides the time of the observation and timezone information.

Report_meaning_of_timestamp via look up to the meaning of timestamp code table denotes whether the time pertains to the beginning, middle or end of the observation period.

Report_duration via look up to the duration code table defines the duration for which the observed value pertains and has values from instantaneous up to monthly including a mixed frequency option.

2.1.3. Observation value information

Three elements in the observations table in CDM-OBS provide the minimum information necessary to describe the observation. 

Firstly, the observed_variable element which can be found by lookup to the observed variable code table identifies what variable is being observed.

Next, the units element which is available via the units code table provides the geophysical units.
Finally, the observed_value is provided as a numeric value. It is the observed value, if necessary converted from original units to the stated units. All conversion and original units can be retained in the full CDM-OBS profile.

2.1.4. Quality information

The quality_flag element (present in the observations table) contains the bare minimum quality information that should be available to users. Via the quality flag code table simple information on quality checking is provided (passed, failed, not checked etc.).

2.2. Compulsory elements that must be present either as header elements or on a per observation basis

In addition to those elements that must always be present on a per observation basis there are a number of elements which must either be present on a per observation basis or must be associated with the data served via appropriate header information and / or an associated file containing necessary metadata. Which approach is appropriate depends upon the nature of the data collection as follows:

  • Wherever the value for the element varies across the set of observations then at least the source_id element must be declared and served as an additional data column.
  • Wherever the value for the element is invariant across the collection and / or invariant per source_id unique value it may be served either via a header element / metadata file or on an observation basis. For efficacy and environmental considerations concerning storage and data serving, however, it is strongly recommended to be served via a header / metadata file in such situations.

Table 2 below outlines those elements which are to be served in this manner. Presently these elements pertain exclusively to information about the source of the observations which is essential metadata that must always be available to the user.

Table 2: Summary of the compulsory profile elements in the CDM-OBS-Core that may be present either on a per observation basis or in header information.

Element grouping

Element

Type

CDM-OBS primary Table

Description

Source information







Source_id

Varchar(pk)

Source configuration

Data source identifier – for provenance. If mixed source collection must be a data column.

product_name

varchar

source configuration

Name of source, e.g. International Comprehensive Ocean Atmosphere Data Set, RS92 GRUAN Data Product.

Product_citation

Varchar[]

Source configuration

Citation information for holdings.

product_references

varchar[]

Source configuration

References describing the dataset

Data_policy_licence

int

Source configuration

Data policy per observation.

contact

varchar []

Source configuration

Contact for the data source


2.2.1.  Source information

For some in-situ lots the offerings constitute a harmonized collection of several underlying sources, whereas for others they arise from a single collection. Regardless, it is necessary to document all of the following, which arise from the source configuration table in the CDM-OBS: 

The source_id provides a traceable identifier to be able to identify the underlying source contributing the data. Where the data holdings being served arise from multiple sources it is mandatory that this element be served as a data column.

The product_name provides the name of the source

The product_citation provides citation information as it pertains to the observation, via the data_policy_licence code table

The product_references provides and references for the user to understand the source.

The data_policy_licence provides via lookup to the data policy licence code table to the type of data policy.

The contact provides contact details for the source

2.3. Optional elements that can be present on a per observation basis

Optional elements are those elements which, while being considered essential to the provision of one or more in-situ services, are not universally available and / or required in all data collections being served via CADS. These elements therefore may be present in any CDM-OBS-Core profile but are not necessary for the collection to be deemed compliant with the CDM-OBS-Core. Table 3 below provides a summary of these optional elements.

Table 3: Summary of the optional profile elements in the CDM-OBS-Core. Data collections may specify one or more of these elements as present in their CDM-OBS-Core profile. Note that the uncertainty collection of elements is able to be repeated as discussed in the main text.

Element grouping

Element

Type

CDM-OBS primary Table

Description

Homogenisation



Homogenization_adjustment

numeric

homogenisation

The adjustment value

Homogenization_method

int

homogenisation

The method for those holdings using mixed methods

Uncertainty




Uncertainty_type

int

uncertainty

For collections with multiple uncertainty elements this triplet can be replicated a number of times once per uncertainty term

Uncertainty_value

numeric

uncertainty


Uncertainty_units

int

uncertainty


Type of observation






Platform_type

int

Station configuartion

Type of observing platform, report or instrument for applications with mixed holdings where in CADS user subsetting may be advantageous

Report_type

Int

header


Instrument_type

int

sensor configuration


Station_automation

int

Station configuration

whether a station is manual, automated or mixed

Value_significance

int

Value_significance

An indicator of what the value signifies (mean, median, max, min etc.)

Vertical profiles




Profile_id

varchar

header

The identifier for the full profile

Z_coordinate

Numeric

observations

Height of observation above sea level (m) or pressure

Z_coordinate_type

int

observations

height or pressure

Representativeness



Spatial_representativeness

int

observations

The spatial representativeness of the measurement for holdings where this differs

Exposure_of_sensor

Int

observations

Exposure in some holdings known

Reanalysis feedback



Fg_depar@body

numeric

Era5fb_table

First guess departure

An_depar@body

numeric

Era5fb_table

Analysis departure

2.3.1. Homogenisation information

For some collections which serve homogenized data the information on the homogenization needs to be provided. These can be served via elements from the homogenization table in the CDM-OBS: 

The homogenization_adjustment provides the numerical adjustment value.

The homogenization_method provides information on the method deployed, via the homogenization_method code table. This should only be selected if the collection consists of observations adjusted by two or more methods where differentiation is required on an observation-by-observation basis. Otherwise this information should be given at the catalogue level.

2.3.2. Uncertainty information

For some collections uncertainty information is available on a per observation basis. Note that in particular for reference and baseline data provided via C3S2 311 Lot 2 complex uncertainty information is available. Therefore the uncertainty elements can be repeated up to 5 times  in a CDM-OBS-lite compliant profile as necessary to properly represent all uncertainty elements [The 5 possible elements pertain to random, systematic, quasi-systematic, structured random and total uncertainty terms as defined in the uncertainty method code table]. All terms are principally defined in the uncertainty table in the full CDM-OBS. 

Uncertainty_type defines the type of uncertainty being quantified, via the uncertainty_type code table

The uncertainty_value provides the value of the uncertainty associated with the uncertainty term defined by the uncertainty type.

The uncertainty_units provides the units for the uncertainty value, via the units code table. This parameter should be included if the uncertainty_value has different units to the observed_value in the core elements. For example the uncertainty may be given as a percentage and the observed value as a geophysical unit. The default would be to exclude this field to minimize data volumes i.e. it is assumed that uncertainty is given in the observed units unless otherwise explicitly documented in CDM-OBS-lite.

2.3.3. Type of observation information

Some collections consist of heterogeneous observation types from, for example, different observing platforms or instrument types. For these collections it may be necessary to document this information for users. 

The platform_type is available from the station configuration table in CDM-OBS via the platform_type code table. For those holdings consisting of mixed platform types e.g. ships and buoys this is useful to users. This element should only be selected for collections consisting of heterogeneous observing platforms.

The report_type is available from the header table in CDM-OBS, via the report_type code table. There are cases where report types may be heterogeneous. For example surface station observations may be a mix of synoptic and METAR messages. Given that what is reported and how may differ by message type this is an option in the CDM-OBS-lite. It should be used only if there is a heterogeneity in report types where the differing reporting modalities have a potential impact on users.

The instrument_type is drawn from the sensor_configuration table in the CDM-OBS, via the instrument_type code table, and should be used where the instrument type is known and known to potentially affect the observed value.

Station_automation is given in the station configuration table, via the automation_status code table. For some collections the data consist of a mix of manual and automated measurements and it is useful to differentiate these.

The value_significance is drawn from the observation_value_significance look-up table. It is used to signify what the observation represents such as a mean, median, maximum or minimum value.

2.3.4. Vertical profiles information

A subset of the collections served from the in-situ lots contains vertical profile information. For these collections additional information is required. The following are permitted optional elements in a CDM-OBS-lite compliant profile to. Cater for such collections: 

The profile_id drawn from the header table provides for a unique identifier for the profile and in the full CDM-OBS is linked to further information via the profile configuration table.

The z_coordinate drawn from the observations table in the full CDM-OBS provides the height above sea level of the observation in metres above sea level or the pressure at which the observation was made.

The z_coordinate_type drawn from the observations table in the full CDM-OBS, via the z_coordinate_type code table, denotes what the units for the z_coordinate are.

2.3.5. Representativeness information

For some collections there may exist information on the representativeness of the observations, information upon which would be useful to end-users. 

spatial_representativeness taken from the observations table uses the spatial representativeness code table to identify the representativeness of the observation across a broad range of possible scales.

Exposure_of_sensor taken from the observations table uses lookup to the instrument_exposure_quality code table to define the siting quality relative to WMO siting class criteria.

2.3.6. Reanalysis feedback information

Finally, for some collections there may be valuable information pertaining to departures of the observation either from the reanalysis background forecast and / or from the analysis field. These data can be found in the ERA5fb table in the full CDM-OBS. It is assumed that these values are in the same units as the observed variable. 

Fg_depar@body provides the departure statistic between the observation and the background forecast.

An_depar@body provides the departure statistic between the observation and the analysis.

3. Declaring the unique CDM-OBS-core configuration for a set of holdings

For any given collection (CADS catalogue entry) a single CDM-OBS-core compliant format must be used consistently throughout the collection. Where distinct formats are absolutely required this arguably points to the need for separate catalogue entries owing to data distinctions.

Any CDM-OBS-core compliant set of holdings shall consist of the core elements in the order given in Table 1 followed by a declared selection of optional elements from Table 2 presented in the order declared. This can include up to 5 entries for each uncertainty element as deemed required (see discussion in 2.2.3).

It is necessary to prepare a format declaration where the first 14 entries are the element names given in Table 1 followed by how entries from Table 2 are to be handled and, finally, as many entries from Table 3 as are required. All entries must use underscores to join words in element entries. This can then be converted into a machine readable format that can then be used to create tools to read in and analyse the data holdings.

An initial example is given in Table 4 below for the proposed profile at the time of drafting to C3S2 311 Lot 1 land holdings. As with all holdings the first 14 elements and their ordering is stipulated (Section 2.1). For the core variables which can be handled via column entries or header or metadata file the holdings are proposed to include the source_id and data_policy_licence as elements 15 and 16 respectively and these will be duplicated via the provision of a metadata file containing the remaining elements. These remaining elements are often lengthy including substantial entries and when numerous sources are returned it is unwieldy to include them via the header. A further two optional variables are envisaged.

Table 4: Proposed method to declare a CDM-OBS-core profile

CDM-OBS-core declaration for C3S2 311 Lot1 land surface meteorological holdings

Core elements

Core header / per element / via metadata file

Optional elements

Element number in data file (column)

Element

Element number in data file (column) AND / OR header AND/ OR metadata file

Element

Element number in data file (column)

Element

1

Station_name

15 AND metadata file

Source_id

17

Platform_type

2

Primary_id

metadata file

product_name

18

observation_value_significance

3

report_id

metadata file

Product_citation



4

observation_id

metadata file

product_references



5

Longitude

16 AND metadata file

Data_policy_licence



6

Latitude

metadata file

contact



7

Height_of_station_above_sea_level





8

Report_timestamp





9

Report_meaning_of_time_stamp





10

Report_duration





11

Observed_variable





12

units





13

Observed_value





14

Quality_flag





This document has been produced in the context of the Copernicus Climate Change Service (C3S).

The activities leading to these results have been contracted by the European Centre for Medium-Range Weather Forecasts, operator of C3S on behalf of the European Union (Delegation Agreement signed on 11/11/2014 and Contribution Agreement signed on 22/07/2021). All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose.

The users thereof use the information at their sole risk and liability. For the avoidance of all doubt , the European Commission and the European Centre for Medium - Range Weather Forecasts have no liability in respect of this document, which is merely representing the author's view.

Related articles

 

  • No labels