Introduction
The new CDS and ADS systems (currently the beta system) have a new GRIB to netCDF converter. This new converter is based on xarray using the cfgrib engine. As each dataset has it's own particular features, the configuration used differs from dataset to dataset to ensure that the returned results are correct and appropriately strcutured.
The general workflow of the new conversion is as follows (for a more detailed description, expand the Jupyter Notebook below):
- Open the GRIB file as an xarray.Dataset using the cfgrib engine
- Where necessary this is a list of xarray.Dataset such that the data is organised into complete and compatible hypercubes
- As each dataset has it's own specifics, the options used when opening the GRIB files differ from dataset to dataset. These options can be made available upon request, but an understanding of the Jupyter Notebook below is expected should users wish to make use of them.
- Rename dimension/coordinate variables to match those described in New Coordinate variable names section
- Additionally, dimensions are expanded to ensure that certain variables are also dimensions of the dataset, e.g. the valid_time dimension when there is a single time step.
- Additionally, dimensions are expanded to ensure that certain variables are also dimensions of the dataset, e.g. the valid_time dimension when there is a single time step.
- Store as netCDF4 file[s] (with internal compression) and return to the user
As GRIB is the native format of most of the datasets produced by ECMWF, the format used in the netCDF files served by the CDS/ADS should not be considered a long-term stable format. Therefore, it is highly discouraged that these netCDF files are used in operational/downstream services as the standard is subject to change.
There are other openly available tools and software available for working with GRIB files, including conversion to netCDF. Please see the following resources for more guidance on using the GRIB files produce by ECMWF directly:
- earthkit is ECMWF supported open source python software which simplifies handling of GRIB files: https://earthkit.readthedocs.io/en/latest/
- eccodes is a package developed by ECMWF that includes libraries and binaries for reading GRIB files, it also includes the "grib_to_netcdf" executable that was used by the legacy ADS/CDS, but please note that this executable is coming to the end of supported period:
- brew install for Mac: https://formulae.brew.sh/formula/eccodes
- conda-forge install for conda environments: https://anaconda.org/conda-forge/eccodes
- manual install instructions: ecCodes installation
- cfgrib is an engine for opening GRIB files in xarray objects: https://pypi.org/project/cfgrib/0.8.4.5/
Why we are changing
There are a number of important reasons for the change in the netCDF produced by the CDS-Beta.
- NetCDF4 is a more modern, more capable, and more future-proof version of netCDF.
- The legacy CDS and ADS used a mixture of 'grib_to_netcdf' (netCDF3) and cfgrib (netCDF4) to convert GRIB files to netCDF, rather than a single common form of netCDF.
- Additionally if files were post-processed in the CDS toolbox (e.g. daily statistics), they were also converted to the common data model (CDS-CDM), which introduced additional differences in the netCDF structure and content.
- As such, the legacy CDS and ADS system supported 3 different netCDF conversions and standards, all of which had their own issues.
The new systems will use cfgrib for the conversion, with some modifications as documented below. The same converter is used for direct downloads and post-processed data (e.g. the daily statistics from ERA5 and ERA5-Land), hence users will have consistency in the various netCDF files received from the modernised (currently beta) portals.
It is important to understand that given the intrinsic differences between GRIB and netCDF there is not a one size fits all approach when converting GRIB to netCDF. We aim to provide users with some sensible default options, but given the code used to perform the conversion is open source and documented it is possible for users to fine tune the conversion to their requirements if needed.
Please also note that these changes may mean that some software packages (e,g, GrADS, CDO) may not be able to open these new netCDF files without modification (e.g. changes to dimension order, dimension names etc.)
Jupter Notebook demonstration
Summary of changes
This section summarises the changes being made for the various dataset families that are effected.
ERA5
Legacy converter
Main differences
- NetCDF3 → NetCDF4 (including compression options)
- Changes to metadata attributes in files
- ordering of dimensions
- changes to time dimension names
- Splitting of files when incompatibilities are detected
- Format and metadata will be consistent with the post-processed data (e.g. daily statistics)
New converter
ERA5-Land
Legacy converter
Main differences
- NetCDF3 → NetCDF4 (including compression options)
- Changes to metadata attributes in files
- ordering of dimensions
- Splitting of files when incompatibilities are detected
- Format and metadata will be consistent with the post-processed data (e.g. daily statistics)
New converter
Seasonal Forecasts
Legacy converter
Main differences
- NetCDF3 → NetCDF4 (including compression options)
- Changes to metadata attributes in files
- ordering of dimensions
- changes to time dimension names
- It is now possible to request data for multiple forecast_reference_times in a single request
- It is also possible to request multiple models in a single request
- The metadata details of the time coordinates is no longer over-simplified into a single, one size fits all "time" dimension
New converter
CARRA/CERRA/UERRA
Legacy converter
Main differences
Compression options
- Changes coordinate variable names
New converter
CAMS global atmospheric composition forecasts
Legacy converter
Main differences
- NetCDF3 → NetCDF4 (including compression options)
- Changes to metadata attributes in files
- ordering of dimensions
- changes to time dimension names
- Data from multiple forecast_reference_times is archived in a single netCDF file instead of being split
- The metadata details of the time coordinates is no longer over-simplified into a single, one size fits all "time" dimension
- pv attribute added to relevant variables
New converter
CAMS global reanalysis (EAC4)
Legacy converter
Main differences
- NetCDF3 → NetCDF4 (including compression options)
- Changes to metadata attributes in files
- ordering of dimensions
- changes to time dimension names
- pv attribute added to relevant variables
New converter
New Coordinate variable names
Some of the coordinate variables have been rename to address issues in the legacy netCDF conversion regarding non CF complaint variables, and unclear and overlapping definitions of the time-dimensions.
Table 1: New Coordinate variable names
Coordinate variable name in new netCDF conversion | Coordinate variable name in legacy netCDF conversion for regional reanalysis (CARRA/CERRA/UERRA) | Coordinate variable name in legacy netCDF conversion for global reanalysis (ERA5 family), seasonal forecasts and CAMS datasets | GRIB/MARS key(s) | Any other business |
---|---|---|---|---|
latitude | latitude | latitude | latitude | |
longitude | longitude | longitude | longitude | |
valid_time | valid_time | time | validityDate+ validityTime |
|
forecast_reference_time | time | time | time | A time dimension for "forecast" data. |
forecast_period | step | time | step | A time dimension for "forecast" data. |
indexing_time | indexing_time | time | indexingDate + indexingTime | This coordinate is only used in 'seasonal-monthly-*-levels' and 'seasonal-postprocessed-*' datasets. This accounts for data where the forecast_reference_time differs for each ensemble member, specifically this happens when the ensemble members are initialised following a lagged start approach. In those situations instead of forecast_reference_time a "nominal start date" is used encoded with this name of "indexing_time". |
forecastMonth | forecastMonth | time | forecastMonth | This coordinate is only used in 'seasonal-monthly-*-levels' and 'seasonal-postprocessed-*' datasets. This is the number of step w.r.t the forecast_reference_time or indexing_time. The convention used for the numbering is "1" is the first complete month after the nominal start date (forecast_reference_time/indexing_time), for instance if the nominal start date is 1st November, forecastMonth=1 means November. |
valid_month (this may be renamed valid_time in future editions) | valid_month | time | monthlyVerificationDate + validityTime | This is required by some monthly datasets where the "valid_time" differs between start of month and end of month, depending on variable. valid_month not used for CAMS datasets. |
pressure_level | isobaricInhPa | level | levelist + levtype or level + typeOfLevel | |
model_level | hybrid | level | levelist + levtype or level + typeOfLevel | |
number | number | number | number |
Legacy converter
To assist in transition, the legacy convertor will be made available programatically via the cdsapi, however this is considered a deprecated format and is no longer supported. To use the legacy convertor, you should update you cdsapi request with:
"data_format": "netcdf_legacy"
Please be aware that the resulting files are netCDF3.
Related articles
-
Please read: CDS and ADS migrating to new infrastructure: Common Data Store (CDS) Engine (Copernicus Knowledge Base)
-
What are GRIB files and how can I read them (Copernicus Knowledge Base)
-
What are NetCDF files and how can I read them (Copernicus Knowledge Base)