Regional climate projections in the CDS

The regional climate projections in the Climate Data Store (CDS) are a quality-controlled subset of the wider CORDEX data over Europe. These data represent only a small subset of CORDEX archive. A set of 30 core variables from the CORDEX archive were identified for the CDS. These are the most used of the CORDEX data. These variables are provided from 5 CORDEX experiment types (evaluation, historical and 3 RCP scenarios)  that are derived (downscaled) from the CMIP5 experiments.

The CDS subset of CORDEX data have been through a metadata quality control procedure which ensures a high standard of reliability of the data. It may be for example that similar data can be found in the main CORDEX archive however these data come with no quality assurance and may have metadata errors or omissions. The quality-control process means that the CDS subset of CORDEX data is further reduced to exclude data that have metadata errors or inconsistencies. It is important to note that passing of the quality control should not be confused with validity: for example, it will be possible for a file to have fully compliant metadata but contain gross errors in the data that have not been noted. In other words, it means that the quality control is purely technical and does not contain any scientific evaluation (for instance consistency check).

In addition, CORDEX data for CDS includes Persistent IDentifiers (PID) in their metadata which allows CDS users to report any error during the scientific analysis. The error will be at least documented on the ESGF Errata Service (http://errata.es-doc.org) but also planned to be documented in the CDS. In this case the users will have the option to reproduce their results using the old, outdated datasets too.

Data Format

The CDS subset of CORDEX data are provided as NetCDF files. NetCDF (Network Common Data Form) is a file format that is freely available and commonly used in the climate modeling community.

NetCDF files are accessible by many programming languages such as Python, R, IDL, C, C++ and Fortran.

A NetCDF file contains: 

  • Global metadata: these fields can describe many different aspects of the file such as
    • when the file was created
    • the name of the institution and model used to generate the file
    • links to peer-reviewed papers and technical documentation describing the climate model,
    • the persistent identifier used to track the file annotations,
    • links to supporting documentation on the climate model used to generate the file
    • software used in post-processing. 
  • variable dimensions: such as time, latitude, longitude and height
  • variable data: the gridded data
  • variable metadata: e.g. the variable units, averaging period (if relevant) and additional descriptive data

The metadata provided in NetCDF files adhere to the Climate and Forecast (CF) conventions. The rules within the CF-conventions ensure consistency across data files, for example ensuring that the naming of variables is consistent and that the use of variable units is consistent.

Quality control of the CDS-CORDEX subset

The CDS subset of the CORDEX data have been through a set of quality control checks before being made available through the CDS. The objective of the quality control process is to ensure that all files in the CDS meet a minimum standard. Data files were required to pass all stages of the quality control process before being made available through the CDS. Data files that fail the quality control process are excluded from the CDS-CORDEX subset or if possible the error is corrected and a note made in the history attribute of the file. The quality control of the CDS CORDEX subset checks for metadata errors or inconsistencies against the Climate and Forecast (CF) Conventions and a set of CORDEX specific file naming and file global metadata conventions. 

Various software tools have been used to check the metadata of the CDS CMIP5 data:

  • The Quality Assurance compliance checking tool from DKRZ is used to check that: 
    • the file name adheres to the CORDEX file naming convention, 
    • the global attributes of the NetCDF file are consistent with filename,
    • there are no omissions of required CORDEX metadata.
  • The CF-Checker Climate and Forecast (CF) conventions checker (included in the QA-DKRZ) ensures that any metadata that is provided is consistent with the CF conventions
  • When possible (i.e., optional),  the Time Axis checker developed by the IPSL is used to check the temporal dimension of the data:
    • for individual files the time dimension of the data is checked to ensure it is valid and is consistent with the temporal information in the filename, 
    • where more than one file is required to generate a time-series of data, the files have been checked to ensure there are no temporal gaps or overlaps between the files.

The data within the files were not individually checked. 

It is important to note that passing of these quality control tests should not be confused with validity: for example, it will be possible for a file to be fully CF compliant and have fully compliant CMIP5 metadata but contain gross errors in the data that have not been noted.

Domains

The CDS-CORDEX subset at the moment consists of the European (and later Mediterrean) CORDEX domains (aka EURO-CORDEX and Med-CORDEX).

Further details on each domain (boundaries, projections, etc.) are available on the related links above.

Experiments

The CDS-CORDEX subset consists of the following CORDEX experiments partly derived from the CMIP5 ones:

  • evaluation: model simulations for the past with imposed "perfect" lateral boundary condition following ERA-Interim reanalyses (1979-2015).
  • historical: model simulations for the past using lateral boundary conditions from Global Climate Models (GCMs). These experiments cover a period for which modern climate observations exist. These experiments show how the RCMs perform for the past climate when forced by GCMs and can be used as a reference period for comparison with scenario runs for the future.
  • scenario experiments RCP2.6, RCP4.5, RCP8.5: ensemble of CORDEX climate projection experiments driven by boundary conditions from GCMs using RCP (Representative Concentration Pathways) forcing scenarios. The scenarios used here are RCP 2.6, 4.5 and 8.5, they provide different pathways of the future climate forcing.

Some further details can be found on the Earth System Documentation site. 

GCM vs. RCM matrix

Regional Climate Model (RCM) simulations needs lateral boundary conditions from Global Climate Models (GCMs). At the moment the CDS-CORDEX subset boundary conditions are extracted from CMIP5 global projections. CORDEX framework requires each RCM downscale a minimum of 3 GCMs for 2 scenarios (at least RCP8.5 and RCP2.6). The C3S-CORDEX subset aims to fill the gaps in this matrix between GCMs (aka "driving models), RCMs and RCPs.

Regional Models 

The models included in the CDS-CORDEX subset are detailed in the table below, these include 12 of the models from the main CORDEX archive. However a small number of models were not included as the data from the models have a research-only restriction on their use, but all data in the CDS are released without restriction. 

Driving Global Models

The driving GCMs used to produced the CDS-CORDEX subset are details in the table below, these include 8 of the models from the main CMIP5 archive. However a small number of models were not included as the data from the models have a research-only restriction on their use, but all data in the CDS are released without restriction.




Driving Global Coupled Models


HadGEM2-ESEC-EARTHCNRM-CM5NorESM1-MMPI-ESM-LRIPSL-CM5A-MRCanESM2MIROC5

Regional Climate Models

RCA4 (SMHI)111113
111
2111
11





CCLM-8-17 (ETH)
11111
11


111




1

1
CCLM-GPU (ETH)










1

3








REMO 09&15 (GERICS)1
11
1

1

1223




1

1
RACMO22E (KNMI)111123

2

1











HIRHAM5 (DMI)

2113

1
11











WRF361H

1

1





1
1
1






WRF381P

1







1




2





ALADIN53 (CNRM)





111














ALADIN63 (CNRM)

1




1














RegCM4.6.1 (ICTP)

1







1

1








HadGEM3-RA (MOHC)

1

1













































RCP26RCP45RCP85
[0-9] = Number of simulations

Ensembles

The boundary conditions used to run a RCM are also identified by the model member if the CMIP5 simulation used. Each modeling centre will typically run the same experiment using the same GCM several times to confirm the robustness of results and inform sensitivity studies through the generation of statistical information. A model and its collection of runs is referred to as an ensemble. Within these ensembles, three different categories of sensitivity studies are done, and the resulting individual model runs are labelled by three integers indexing the experiments in each category. 

  • The first category, labelled “realization”, performs experiments which differ only in random perturbations of the initial conditions of the experiment. Comparing different realizations allow estimation of the internal variability of the model climate. 
  • The second category refers to variation in initialization parameters. Comparing differently initialized output provides an estimate of how sensitive the model is to initial conditions. 
  • The third category, labelled “physics”, refers to variations in the way in which sub-grid scale processes are represented. Comparing different simulations in this category provides an estimate of the structural uncertainty associated with choices in the model design. 

Each member of an ensemble is identified by a triad of integers associated with the letters r, i and p which index the “realization”, “initialization” and “physics” variations respectively. For instance, the member "r1i1p1" and the member "r1i1p2" for the same model and experiment indicate that the corresponding simulations differ since the physical parameters of the model for the second member were changed relative to the first member. 

It is very important to distinguish between variations in experiment specifications, which are globally coordinated across all the models contributing to CMIP5, and the variations which are adopted by each modeling team to assess the robustness of their own results. The “p” index refers to the latter, with the result that values have different meanings for different models, but in all cases these variations must be within the constraints imposed by the specifications of the experiment. 

For the scenario experiments, the ensemble member identifier is preserved from the historical experiment providing the initial conditions, so RCP 4.5 ensemble member “r1i1p2” is a continuation of historical ensemble member “r1i1p2”.

For CORDEX data, the ensemble member is equivalent to the ensemble member of the CMIP5 simulation used to extract boundary conditions.

File naming conventions

When you download a CORDEX file from the CDS it will have a naming convention that is as follows:

<variable>_<domain>_<driving-model>_<experiment>_<ensemble_member>_<rcm-model>_<rcm-run>_<time-frequency>_<temporal-range>.nc

Where

  • <variable> is a short variable name, e.g. “tas” for ”temperature at the surface”
  • <driving-model> is the name of the model that produced the boundary conditions
  • <experiment> is the name of the experiment used to extract the boundary conditions
  • <ensemble-member> is the ensemble identifier in the form “r<X>i<Y>p<Z>”, X, Y and Z are integers
  • <rcm-model> is the name of the model that produced the data
  • <rcm-run> is the version run of the model in the form of "vX" where X is integer
  • <time-frequency> is the time series frequency (e.g., monthly, daily, seasonal) 
  • the <temporal-range> is in the form YYYYMM[DDHH]-YYYY[MMDDHH], where Y is year, M is the month, D is day and H is hour. Note that day and hour are optional (indicated by the square brackets) and are only used if needed by the frequency of the data. For example daily data from the 1st of January 1980 to the 31st of December 2010 would be written 19800101-20101231.