You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Standards and Conventions

File formatting

blah

File Structure

Fi

File Naming

 

Metadata

 

 

Summary

  • Each netCDF4 file contains a single output variable (along with coordinate/grid variables, attributes and other metadata) from a single model and a single simulation (i.e., from a single ensemble member).
  • There is flexibility in specifying how many time slices (samples) are stored in a single file. A single file can contain all the time-samples for a given variable and climate experiment, or the samples can be distributed in a sequence of files.
  • Much of the metadata written to the output files is defined in MIP-specific tables of information, which in this document are referred to simply as "MIP tables" in ASCII files.
  • The metadata is constrained by the CF convention (NetCDF Climate and Forecast (CF) Metadata Convention) and as specified in the CMIP5 tables.
  • This is also based on information from the SPECS_standard_output.pdf document
  • Attributes currently are a significantly reduced subset of the SPECS/CMIP5 requirements.
  • The output files are written through the NetCDF API following the NETCDF4 _CLASSIC model and without compression of any kind.

NetCDF Dataset Design Overview

Component  ExampleNotes  
File Structure      
  Single output variable per fileegrr_enfh_atmos_day_plev_201601A_19950417-19950418_ta_r3.nc
  • single output variable per file
  • single stream
  • Single simulation
  • Number of timesteps included is up to the data provider
  • file size must be less than 2GB
  • date range indicated in filename
  • filename needs to include:
    • production date?
    • model version?
    • date range start?
    • date range end?

  
Directory Structure      

 


 

 

ec:/copernicus/c3s/<activity>/<institute>/<stream>/<modeling realm>/<frequency>/<production date and start date identifier>/<data year>/<data month>/<data day>/<variable MARS name>/<ensemble member>/

 

ec:/copernicus/c3s/seasonal/egrr/enfh/atmos/month/201601A/1995/04/17/ta/r1/

 

  • Directory structure on ECFS
  • need to identify production date and start date of forecast/reforecast
  
File Format      
  

netCDF 4

 
  • Classic form
  • no compression
  
File Names      
 <institute>_<stream>_<modeling realm>_<frequency>_<level>_<production date and start date identifier>_<data year><data month><data day>[-<data year><data month><data day>]_<variable MARS name>_<ensemble member>/ 

egrr_enfh_atmos_day_sfc_201601A_19950417-19950418_ta_r3.nc

egrr_enfh_atmos_month_plev_201601A_199504-199505_ta_r3.nc

 

"201601A" is  a placeholder while a form fore representing the model version, production year and startdate is determined:

egrr_enfh_atmos_month_plev_P2016_M1A_S19950401_199504-199505_ta_r3.nc

P=production year

M=model version

S=startdate

e.g. could the filename (alternatively) be something like:


egrr_enfh_atmos_month_plev_S19950401_199504-199505_ta_r3p20160101m411.nc

 

  • date range varies according to output frequency
  • Mon data only needs YYYYMM
  • daily data needs YYYYMMDD
  • Hourly Data YYYYMMDDHH ?
  • ensemble member has no leading zero's
  • filename needs to include:
    • production date?
    • model version?
  

Additional Questions to be addressed

QuestionDiscussionDecision
File format to be used?
Francisco Doblas-Reyes NetCDF4? With or without compression?
Kevin Marsh netCDF4 classic model (with deflate =6 suggested by Pierre-Antoine)
 
File naming,

Kevin Marsh Pierre-Antoine Bretonniere proposed  follow SPECS convention

 
forecast/hindcast matching and labelling  
File size recommendation (maximum size)?

Kevin Marsh Pierre-Antoine Bretonniere suggested 4GB recommended maximum size

Kevin Marsh recommend 4GB Max Size for data files

Versioning of data files?  
DOI

Kevin Marsh DOI likely to be assigned at dataset level

Kevin Marsh DOI likely to be assigned at dataset level

Variable short names to be specified?

Kevin Marsh  Antonio S. Cofino Gonzalez suggested follow cmip5 short names

Kevin Marsh follow cmip5 short names

Coordinate short names to be specified?

Kevin Marsh Antonio S. Cofino Gonzalez suggested  follow cmip5 coordinate short names

Kevin Marsh follow cmip5 coordinate short names

Extension to include ocean data for C3S?

Kevin Marsh yes, but not in the initial convention release

Kevin Marsh Not considered in initial release

Grids, resolution etc to be specified?

Kevin Marsh Antonio S. Cofino Gonzalez agreed 1 degree grid specified with valid max/min, but actual grid points not specified

Kevin Marsh 1 degree grid specified with valid max/min, but actual grid points not specified

MARS attributes to be specified?

Kevin Marsh These will be added by C3S, rather than data provider

Kevin Marsh These will be added by C3S

 standard name request/assignment process?

Kevin Marsh requested via standard name mailing list. Note that this process can take some considerable time.

Kevin Marsh requested via standard name mailing list

 

Discussion about time coordinates

NOTE: The SPECS approach (2 1D time coordinates) has been chosen for the "providers" convention

 

The encoding of multiple time coordinates requires particular consideration. An explicit example of the structure is given below.

Example of encoding data with multiple time axis informations

  
double forecast_reference_time(forecast_reference_time) ;
       forecast_reference_time:bounds = "forecast_reference_time_bnds" ;
       forecast_reference_time:units = "hours since 1970-01-01 00:00:00" ;
       forecast_reference_time:standard_name = "forecast_reference_time" ;
       forecast_reference_time:calendar = "gregorian" ;
double leadtime(leadtime) ;
       leadtime:bounds = "leadtime_bnds" ;
       leadtime:units = "hours" ;
       leadtime:standard_name = "forecast_period" ;
       leadtime:calendar = "gregorian" ;
double time(forecast_reference_time,leadtime) ;
       time:axis = "T" ;
       time:bounds = "time_bnds" ;
       time:units = "hours since 1970-01-01 00:00:00" ;
       time:standard_name = "time" ;
float temp(forecast_reference_time,leadtime,pressure,latitude,longitude);
      temp:units = "K";
      temp:standard_name = "air_temperature";
      temp:coordinates = "time";

Francisco Doblas-Reyes I interpret this as the time coordinates being a hypercube, where there could be missing data; this won't be consistent with the CMIP files; I still find this confusing unless a discussion about what to do with the missing data is undertaken.

Eduardo Penabad: Wouldn't that be solved by clarifying that different variables within the same file could potentially have different time coordinates/dimensions?

Francisco Doblas-Reyes Not sure. If to simplify you assume one variable only and this variable has in one file data for two start dates, one with three forecast time steps and another one with only two, the time dimensions will be forecast_reference_time=2, leadtime=3, but one of the values of temp() will have missing values, unless I haven't understood the model.

Antonio S. Cofino Gonzalez: discussion on multi-time dimension data

 

 

 

 

  • No labels