Introduction

The new version of the C3S netCDF encoding standards (C3S-0.3) is an evolution of the existing encoding standards (C3S-0.2) which aims to make them more generic and permissive, in the sense that some of them could be characterized as mandatory or optional depends on operational need. In that way, the encoding standards will be more generic and can be easily applied to projects that haven't been yet operational. 

An item is characterized as mandatory if it is used during the processing of the data in order to be stored in MARS. All the other items are characterized as optional.  Data also adheres to the DRS (Data Reference Syntax) and Controlled Vocabulary standard for naming files and structured paths. 

It's important to mention that the updated version of encoding standards, as well as the previous ones, is constrained by the CF convention, and standards coming from SPECS and CMIP5/6, which means that metadata is not strictly needed by the hosting environment but shall be mandatory to make the output compliant with those standards. As a result, it will achieve maximum interoperability, satisfying the users' expectations to be able to extract data both efficiently and in a uniform way across all models. 

An example of the relaxation of the existing standards is that the dissemination of the data interpolated in a common grid could be under some circumstances not mandatory as in many cases project-dependent.  

Additionally, ACDD has been also taken into account when defining the data discovery-related metadata.

Hence, the following links are valuable sources of information that have informed the definition of this proposal:

CF convention

CF convention standard names tables

SPECS file content and format, data structure and metadata

CMIP5 list of variables

CMIP6 Data Request: MIP variables search

ACDD convention

Change List



DateChangeversion

 

Initial versionC3S-0.1

 

Update the CF convention to CF-1.11C3S-0.2


Correct atmosphere model to atmos in the source global attribute


Update the controlled vocabulary of the global attribute "institute_id" with the following:

"kwbc" for NCEP

"rjtd" for JMA

"cwao" for ECCC

"ammc" for XXXX



Update the controlled vocabulary of the global attribute "institution" with the following:

"NCEP National Centres for Environmental Prediction"

"JMA Japan Meteorological Agency"

"ECCC, Environment and Climate Change Canada, Montreal, QC, Canada"

"Australian Australian BoM"



Update the controlled vocabulary of the global attribute "level_type" with the following:

"ocean2d"



Spatial coordinates have been updated in order to include additional coordinates used by the ocean products. 

The additional coordinates are: 

"sigma_theta"

"depth" and

"temperature" 



The variables have been updated to include the ocean field. 


  

Global attributes:

  • the attributes "Conventions", "source", "institute_id", "project", "creation_date", "forecast_type", "modeling_realm", "frequency", "level_type", and "forecast_reference_time" become Mandatory attributes. All the rest become Optional. Information about when a global attribute is mandatory or note provided in column 'Required'.
  • The value of the global attribute "title" was updated to describe project data. 
  • The value of the global attribute "source" was updated to describe project data. 
  • The values of the global attribute "institute_id" and "institution" were updated to include additional institutions.
  • The value of the global attribute "project" was updated to describe additional projects.
  • The value of the global attribute "forecast_type" was updated for the needs of analysis data.
  • The value of the global attribute "frequency" was updated including 3 hourly data.
  • The value of the global attribute "level_type" was updated to describe ocean2d data. 
C3S-0.3


Spatial Coordinates: 

  • Mandatory values when it's applicable are described in notes.  
  • The spatial coordinate "depth" was updated for the needs of ocean variables. 
  • Spatial coordinates "sigma_theta" and "temperature" were added to describe ocean variables. 


Discrete Axes:

  • The coordinate variable "realization" was moved to Discrete Axes
  • The coordinate variable "vegetation_type" was added. 


Time Coordinates:

  • The coordinate variable "time" was updated for the needs of analysis data. 


Cell boundaries:

  • Mandatory values when it's applicable are described in notes. 


The encoding of data variables was moved to Appendix I. 



The candidate attributes table has been merged with the common attributes table and the attribute name is now optional





Encoding Guide for netCDF files 

File Formatting

The format of the output products should be netCDF, and conform to the CF metadata standards following the requirements below:   

  • The output files shall be written through the NetCDF API
  • The NETCDF4 _CLASSIC model shall be adopted
  • The recommended compression level shall be deflate=6
  • The Shuffling shall be True
  • The Fletcher32=True is strongly recommended

File Structure

The fill structure shall be:  

  • Each netCDF4 file shall contain a single output variable (along with coordinate/grid variables, attributes and other metadata) from a single model and a single simulation (i.e., from a single ensemble member and a single start date)
  • Recommended maximum file size of 4GB (to avoid any I/O performances)
  • Each file shall be accompanied by a file containing a hash created with sha256sum. 

Note: how to create hash files

sha256sum filename.nc > filename.sha25

File Naming Conventions

The filenames of the products in the C3S seasonal forecast are made following the CMIP5/6 and SPECS DRS elements, as described below. 

It's important to highlight that each file must contain only a single output field from a single simulation (i.e., a single run). 

In addition, the output filename shall be constructed using a subset of metadata.

C3S Output Filename Conventions

The general filename formats for output files generated within C3S shall follow the below filename convention. All the elements are separated by underscores (“_”) and must appear in the following order:

The Convention

<institute_id>_<model_id tag>_<forecast_type>_<start_date_identifier>_<modeling_realm>_<frequency>_<level_type>_<variable_name>_<ensemble_member>.nc

Details:

<institute_id> is the institute id as it is defined in the controlled vocabulary of the global attributes; 

<model_id_tag> as it is defined in the description of the "source" global attribute. For project-specific files, the model-id should start with the name of the project from the global attribute "project" (e.g. CERISE-SystemName-v20240101);

<forecast_type> as it is defined in the controlled vocabulary of the global attributes; 

<start_date_identifier> is a string "SYYYYMMDDHH" that defines the start date of the forecast;

<modeling_realm> as it is defined in the controlled vocabulary of the global attributes; 

<frequency> as it is defined in the controlled vocabulary of the global attributes; 

<level_type> as it is defined in the controlled vocabulary of the global attributes; 

<variable_name> is the short name of the variable inside the netCDF file. 

<ensemble_member> is the 'realization' coordinate value inside the netCDF file.

.nc is the general netCDF suffix extension 

As a general condition, defined before, the file name should be able to be rebuilt from the contents of the metadata. As a result, all the above attributes should be mandatory global attributes of the netCDF file (see below). 

Examples of the above convention are:

lfpw_System8-v20210101_forecast_S2023030100_atmos_12hr_pressure_ta_r25i00p00.nc    (contribution to the C3S operational service)

lfpw_CERISE-SystemName-v20210101_hindcast_S2010110100_land_day_soil_mrlsl_r01i00p00.nc    (contribution to the CERISE project)

Global attributes

The following properties are intended to provide information about where the data came from and what has been done to it. This information is mainly for the benefit of human readers and data discovery mechanisms. The global attribute values are all character strings. When an attribute appears both globally and as a variable attribute, it is the variable’s version which has precedence.

From version 0.3 of this encoding, the attributes have been categorised as mandatory or optional. 

In addition, an attribute can be project-depended and project-oriented. This means that an attribute can be defined in general as optional but in the scope of the project it will be mandatory through a controlled vocabulary. In other words, when a value is defined in a controlled vocabulary, by definition makes that attribute mandatory in the scope of the project. The advantage of that approach is the flexibility of the encoding standards when they are used by a non-operational project. 

The table below describes the minimum set of global attributes. The providers may define any additional attributes which add relevant information associated with the provider or the project and are thought to be useful. These additional attributes are allowed by the standards but it's clear that are not controlled by them.    

Attribute Name

Value

Required 

Examples

Comment

ConventionsCF_convention_string  C3S-0.1 [Other convention] :...Mandatory"CF-1.11 C3S-0.3"

Multiple conventions may be included (separated by blank spaces)

title

Controlled vocabulary

<short institution name> seasonal forecast model output prepared for C3S"

For project use:

<short institution name> seasonal forecast model output prepared for CERISE project"

CF: Free text

ACDD (highly recommended)


Optional

"ECMWF seasonal forecast model output prepared for C3S"

"DWD seasonal forecast model output prepared for CERISE project"

A short phrase or sentence describing the dataset. In many discovery systems, the title will be displayed in the results list from a search, and therefore should be human readable and reasonable to display in a list of such names
<short institution name> is the first element of the comma-separated list of values of the corresponding "institution" attribute
references

Controlled vocabulary: 

URIs (such as a URL or DOI) for papers or other references. A valid doi is recommended

CF: Free text

Optional"doi:10.5194/gmd-8-1509-2015"
Published or web-based references that describe the data or methods used to produce it.
For a research project which is still under development, the attribute is optional. 
source

String contains the version of the model

<model_id>


Additional information for an advanced description of the model is high recommended.   

The following template should be followed in constructing the advanced string:

"<model_id> :  atmos: <model_name> (<technical_name>, <resolution_and_levels>); ocean: <model_name> (<technical_name>, <resolution_and_levels>); sea ice: <model_name> (<technical_name>); land: <model_name> (<technical_name>); coupler <model_name> (<technical_name>)''

Additional explanatory information may follow the required information.

NOTE that for some models, it may not make much sense to include all these components.

The first portion of the string, “model_id”, should be built using the following template:

"project-model_name-vYYYYMMDD" where YYYYMMDD is the release date of that version of the model (the date when it was first used)

project is used only for projects. For C3S, the operational service project is empty. 

Mandatory


"System8-v20210101:atmos ARPEGEv6.4.2(cy37t1,Tl359L137); ocean NEMOv3.6 (ORCA025 L75); sea-ice GELATOv6; land surface SURFEXv8.0; coupler OASIS MCT v3.0; river routing CTRIP"


"cerise-SystemName-v20240101:atmos ARPEGEv6.4.2(cy37t1,Tl359L137); ocean NEMOv3.6 (ORCA025 L75); sea-ice GELATOv6; land surface SURFEXv8.0; coupler OASIS MCT v3.0; river routing CTRIP"

The method of production of the original data. If it was model-generated, source should name the model and its version, as specifically as it could be useful.

It is a character string fully identifying the model and version used to generate the output. It should include information concerning the component models.

Note that information about changes in the individual components with respect to the "official" releases should be included (e.g. a different bathymetry)

The "source" attribute should include as much information as possible to not just identify the model but to brief the user about it.

For project-specific files the model_id should provide information about the project.

institute_id

Controlled Vocabulary:

"ecmf" for ECMWF
"egrr" for Met Office
"lfpw" for Météo-France
"edzw" for DWD
"cmcc" for CMCC
"kwbc" for NCEP
"rjtd" for JMA
"cwao" for ECCC
"ammc" for BoM

Mandatory

"edzw"Standardized 4 characters identifier of the institution that produced the data;
NOTE all the values come from abbreviations of WMO/GRIB "originating centre" table, except CMCC (not available there)
institution

Controlled Vocabulary:

"ECMWF, European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom"

"Met Office, Exeter, United Kingdom"

"Météo-France, Toulouse, France"

"DWD, Deutscher Wetterdienst, Offenbach, Germany"

"CMCC, Centro Euro-Mediterraneo sui Cambiamenti Climatici, Bologna, Italy"

"NCEP National Centres for Environmental Prediction"

"JMA Japan Meteorological Agency"

"ECCC, Environment and Climate Change Canada, Montreal, QC, Canada"

"BOM, Australian Bureau of Meteorology, Melbourne, Australia"

CF: Free text


Optional (high recommended)

"Météo-France, Toulouse, France"

Specifies where the original data was produced. The name of the institution principally responsible for originating this data.

NOTE: The first element of the comma-separated list of values will be used as a shortened version of this attribute in some of the other global attributes ('summary', 'title')

contact

Controlled Vocabulary:

Copernicus User Support URI should be used
http://copernicus-support.ecmwf.int

CF: Free text

Optional

"http://copernicus-support.ecmwf.int"


optional for projects: "https://www.cerise-project.eu/"


project

Controlled Vocabulary:

"C3S Seasonal Forecast" or "<project>" should be used

CF: Free text


Mandatory


"C3S Seasonal Forecast"

"CERISE"

The attribute "project" is always mandatory, however, the value depends on the operational service or the project.  
creation_date

SPECS: YYYY-MM-DDThh:mm:ss<zone>

ISO 8601:2004 extended format



Mandatory

"2011-06-24T02:53:46Z"

The date on which this version of the data was created. Modification of values implies a new version, hence this would be assigned the date of the most recent values modification. Metadata changes are not considered when assigning the creation_date

NOTE: The ACDD 1.3 names this attribute as "date_create". The name "creation_date" has been used following SPECS convention.

commentFree textOptional
  • "Produced by University of Hamburg for DWD at ECMWF HPC facilities"
  • "Run by CMCC at CINECA"
Miscellaneous information about the data, not captured elsewhere.
forecast_type

Controlled Vocabulary

"forecast" or "hindcast" or "analysis"

Mandatory 

"forecast"

To identify the type of data



modeling_realm

Controlled Vocabullary

"atmos", "ocean", "land", "landIce", "seaIce", "aerosol", "atmosChem", "ocnBgchem"

Mandatory


"seaIce"

A string that indicates the high-level modelling component that is particularly relevant to the variable encoded
Controlled vocabulary taken from SPECS


Value depends on the variable (see "global attributes" column in variables tables)

frequency

Controlled Vocabulary

"mon", "day", "12hr", "6hr", "3hr", "fix"

Mandatory

"day"

A string indicating the interval between individual time-samples.
Controlled vocabulary extended from SPECS.

Value depends on the variable (see "global attributes" column in variables tables)

level_type

Controlled Vocabulary

"surface", "pressure", "soil", "ocean2d"

Mandatory

"pressure"

A string indicating the type of the level where the variable comes from

Value depends on the variable (see "global attributes" column in variables tables)

history

Controlled Vocabulary

Empty string

Optional""

To avoid this attribute being polluted by usual netCDF tools, it must be enforced to an empty string.


commit

timestamp + URL of a commit in a CVS repository

Optional

"2017-04-01T13:48:25Z https://git.ecmwf.int/projects/C3SS/repos/ecmf/System4_v20111101"

This attribute intends to keep trace of the tools/scripts used to post-process the data output from the model.

Ideally it should contain the link to a repository containing the specific set of tools and scripts needed to reproduce the same data from the model output. It is highly desirable to have that traceability information.

As a surrogate when the previous is not feasible it should include the timestamp followed by an URL pointing to the C3S documentation repository of the correspondent model version (properly labelled with the <model_id> introduced in 'source" attribute)

summary

Controlled Vocabulary:
"Seasonal Forecast data produced by <short institution name> as its contribution to the seasonal forecast activity of the Copernicus Climate Change Service (C3S). The data has global coverage with a 1-degree horizontal resolution and spans for around 6 months since the start date"

ACDD (highly recommended)

Optional 

"Seasonal Forecast data produced by DWD as its contribution to the seasonal forecast activity of the Copernicus Climate Change Service (C3S). The data has global coverage with a 1-degree horizontal resolution and spans for around 6 months since the start date"

Optional for projects: 

"Seasonal Forecast data produced by CMCC as its contribution to the CERISE project. The data has global coverage with a 1-degree horizontal resolution and spans for around 4 months since the start date"

A short paragraph describing the dataset


<short institution name> is the first element of the comma-separated list of values of the corresponding "institution" attribute

keywords

Fixed string

"Seasonal Forecasts, C3S, ECMWF, Copernicus, Climate Change, Climate Services, Earth Science Services, Environmental Advisories, Climate Advisories"

ACDD (highly recommended)

Optional


A comma separated list of key words and phrases.

NOTE: This attribute is likely to be modified in the future, once the contents of the Thesaurus for CDS faceting will be defined

forecast_reference_time

SPECS: YYYY-MM-DDThh:mm:ssZ

NOTE: This is ISO 8601:2004 extended format, but time zone is required to be UTC

Mandatory


"2011-06-01T00:00:00Z"

time of the analysis from which the forecast was made


Introduced as a global attribute to keep compatibility with SPECS
(note that works fine for SPECS data structure, i.e. one variable per start time per file)


For "forecast_type"="analysis" this global attribute must be removed

Spatial Coordinates

The table below describes all the requirements for the spatial coordinates.

The usage of a spatial coordinate depends on the data variable and it is described in the variables section. Here, is provided how a spatial coordinate should be encoded.   

Type
(CMIP5)
Coordinate Name
(CMIP5)
Dimension Names
(CMIP5)
Axisstandard_namelong_name
(CMIP5)
units
(CF canonical units)
positivevalid_min
(CMIP5)
valid_max
(CMIP5)
boundsNote
doublelatlatYlatitudelatitudedegrees_northN/A-90.90.lat_bnds
  • For C3S, values (1x1deg grid) prescribed:
    center of 1-degree cells
    dimension lat=180

      [-89.5, -88.5 , ..., -0.5, 0.5 ... 89.5]


doublelonlonXlongitudelongitudedegrees_eastN/A0.360.lon_bnds
  • For C3S, values (1x1deg grid) prescribed: prescribed:
    center of 1-degree cells
    dimension lon=360

     [0.5 , 1.5 , ..., 358.5, 359.5]

doubleplevplevZair_pressurepressure

Pa

downN/AN/AN/A
  • For C3S, values prescribed:
    dimension plev=12

      [1000., 925., 850., 700., 500., 400., 300.,          200., 100., 50., 30., 10.]

  • Values here are written in hPa, they should be Pa.
  • This is also referred to as isobaric level by some tools
doubledepth

depth

(for soil levels)

None

(scalar auxiliary coordinate for ocean variables)

Zdepthdepth

m

downN/AN/Adepth_bnds
  • For C3S, when used for ocean temp/salinity in the upper 300m, values prescribed:

      depth=300; depth_bnds=[0,300]

  • Only used for soil model levels and ocean variables. 
  • Number and depth of levels are not prescribed by C3S
doubleheight(scalar auxiliary coordinate)Zheightheightmup 

CMIP5:

2mtemp: 1.

10mu/v: 1.

CMIP5:

2mtemp: 10.

10mu/v: 30.

N/A
  • Used for single level fields (height, soil,SST)

       e.g. ~2 m standard surface air temperature and surface humidity height or ~10 m standard wind speed height

doublesigma_theta

None

(scalar auxiliary coordinate)

N/Asea_water_sigma_thetaSigma-theta of Sea Waterkg m-3N/AN/AN/AN/A
  • Used for mixed layer depth
doubletemperature

None

(scalar auxiliary coordinate)

N/A

sea_water_

potential_temperature

Isotherm Temperature

degC

CF: canonical units are K

N/AN/AN/AN/A
  • Used for isotherm depths

Note about the horizontal coordinates: The regridding procedure to provide the data in the 1-degree grid must take into account that the full definition of the gird cells is given by the cell boundaries (lat_bnds, lon_bnds)

Discrete Axes

The table below describes all the requirements for the discrete axes. 

Type

Coordinate Name

Dimension Names

Axisstandard_namelong_name

units

boundsControlled vocabularyNote
charvegetation_typevtypeN/Aarea_typeN/AN/AN/A


The labelled axis is used to identify the vegetation type. The names should be chosen from the list of CF area types

C3S: string

 realizationstr31=31

 

E realizationrealization1N/A


members are not a physical quantity. Realization is a discrete coordinate and the members its categorical values (ordered or non-ordered ones)

SPECS approach:

rXXiYYpZZ

In the current version, the realization coordinate variable doesn't comply with the CF conventions. In future revisions the realization variable will become a discrete axis like the vegetation type

Time Coordinates

The table below describes the requirements for the Time Coordinates.

The usage of all three Time Coordinates as described below is mandatory by the encoding standards, however, the units depend on the project and the physical variable itself and it is associated with the temporal resolution (frequency) of each variable. A controlled vocabulary has been introduced to reflect the dependency of the time coordinates encoding with the project. 

Type

Coordinate Name

Dimension Names

Axis

standard_name

long_name

calendar

units

bounds

Notes

doublereftimeN/AN/Aforecast_reference_time"Start date of the forecast"gregorian

UDUNITS time units
e.g.
"hours since YYYY-MM-DD hh:mm:ss TZhh:TZmm"

N/A

In SPECS it is only given as a "global_attribute"
It has been additionally introduced here as a coordinate variable to ease future netCDF management (e.g. file merging)


Only for forecast/hindcast

doubleleadtimeleadtimeN/Aforecast_period"Time elapsed since the start of the forecast"N/A

SPECS: days

C3S: requested units can be relaxed to equivalent time units

leadtime_bnds

The interval of time between the forecast reference time and the valid time

Boundaries not needed when this time coordinate is used for instantaneous values (note that "time:point" is used as cell_method in those cases)

When boundaries are required, the value of the coordinate must be in the centre of the correspondent time cell boundaries


Only for forecast/hindcast

doubletime

leadtime (for forecast/hindcast)


time (for analysis)

N/A

time

"Verification time of the forecast" 

or 

"Valid time" for analysis data

gregorian

SPECS: "days since 1850-01-01"

 C3S: requested units can be relaxed to equivalent time units

time_bnds

Time for which the forecast/analysis is valid

Boundaries not needed when this time coordinate is used for instantaneous values (note that "time:point" is used as cell_method in those cases)


Note: Definitions for "leadtime" and "time" have been taken from SPECS. The introduction of "reftime" as a variable has been adapted from SPECS global attribute description for the forecast reference time.

Note: Even though there are different requested time steps among the variables (6h, 12h, 24h), just one set of time axes has been defined, as that would be enough when applying the requirement of "one variable per file"

warning

In the forecasts and hindcast dataleadtime" has been selected as a dimension (instead of "time") for both "time" and "leadtime". That means "leadtime" is the coordinate and "time" is an auxiliary coordinate. The main difference between "leadtime" and "time" is that time is a time stamp representing the valid time of the forecast, while "leadtime" is the interval of time between the forecast reference time and the valid time. 

  • This diverges from SPECS (where "time" was the name of the dimension and the coordinate, and "leadtime" was an auxiliary coordinate)
  • Here it has been done like that because
    1. both reftime and leadtime are the relevant (let's say "orthogonal") coordinates coming from the relationship time = reftime + leadtime
    2. doing like that has some advantages when merging netCDF files ("leadtime" can be easily shared by different variables in a merged file, while "time" cannot)

Cell boundaries

The table below describes the requirements for the Cell Boundaries in accordance with section 7.1 Cell Boundaries of CF convention.

Following the same approach as for Spatial and Time Coordinates, a controlled vocabulary has been introduced for providing encoding standards with are relevant only to a specific project. 

Info

To represent cells we add the attribute bounds to the appropriate coordinate variable(s). The value of bounds is the name of the variable that contains the vertices of the cell boundaries. We refer to this type of variable as a "boundary variable." A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable. The additional dimension should be the most rapidly varying one, and its size is the maximum number of cell vertices. Since a boundary variable is considered to be part of a coordinate variable’s metadata, it is not necessary to provide it with attributes such as long_name and units


Bounds NameDimensionsNote
time_bndsleadtime, bnds
leadtime_bnds
lat_bndslat, bnds

For C3S:

Values (1x1deg grid) prescribed:
[-90., -89.], [-89., -88.], ... [89., 90.]

lon_bndslon, bnds

For C3S:

Values (1x1deg grid) prescribed:

[0., 1.], [1., 2.], ... [359., 360.]

depth_bndsdepth, bnds(for soil layers)
Should define the full vertical extent of the soil model layers.
depth_bndsbnds

(scalar auxiliary coordinate for ocean variables)
For C3S:

Values prescribed (depth=300)

[0,300]

Grid mapping


As described in section 5.6 Grid Mappings and Projections of CF convention. (see quote below)

When the coordinate variables for a horizontal grid are longitude and latitude, a grid mapping variable with "grid_mapping_name" of "latitude_longitude" may be used to specify the ellipsoid and prime meridian.


Following that, it has been decided to include, as mandatory, in this encoding guide the following variable

char hcrs ;
    hcrs:grid_mapping_name = "latitude_longitude" ;

Appendices

Appendix I. Data Variables

Appendix II. Extension of the C3S encoding standards for analysis data