Contributors: Simon Noone (National University Maynooth, NUIM), Peter Thorne (NUIM), Corinne Voces (NUIM), Anthony Kettle (NUIM), Kevin Healion (NUIM),Robert Dunn (Met Office UK), Kate Willett (Met Office UK), Elizabeth Kent (National oceanography centre, noc), Dave Berry Now (noc)1, Matt Menne (NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION'S NATIONAL CENTERS FOR ENVIRONMENTAL INFORMATION, NOAA NCEI), Shelley McNeill (NOAA NCEI), Nancy Casey (NOAA NCEI)
Issued by: NUIM / Peter Thorne
Date: 27/11/2024/2017
Ref: C3S2_D311_Lot1.2.2.2_2024_202411_ 10TH version Land_Data_User_guide
Official reference number service contract: 2021/C3S2_311_Lot1_NUIM
Executive Summary
The C3S2 311 Lot1 Collection and Processing of In Situ Observations service is concerned with the provision of globally available land and marine surface meteorological records. The service includes inventorying of, and brokering access to, data sources, their harmonization (via conversion to a Common Data Model, merging, and quality assurance) and their provision via the Copernicus Climate Change Service Data Store (CDS).
This land user guide describes all relevant aspects of the land data service necessary for a user to access and work with the land data appropriately and with confidence. This document does not constitute a technical service document which instead is available separately. There is also an accompanying marine user guide that describes the marine data and its processing.
This is a living document which shall be subject to regular revision to reflect the status of the service at any given point in time. Releases of this document will always accompany new data releases. On an exceptional basis, as warranted, additional releases may occur to clarify issues raised by users or document important changes between releases such as any modification to the modalities of data access. Feedback on the adequacy and completeness of the document is welcomed at any time. Feedback should be provided via ECMWF Support. All such feedback shall be passed on in full to the service team for due consideration.
Former versions are archived and available upon request (see section 10). The version history is given below:
Version | Release Date | Release notes |
0.0 | 31/08/2017 | Initial version consisting of section outlines and description of initial archiving material |
1.0 | 14/12/2017 | Updates to reflect the status at time of the test data service release |
2.0 | 21/12/2018 | Updates to reflect the status at time of the beta release |
3.0 | 10/01/2020 | Updates to reflect the initial data release |
4.0 | 16/07/2020 | Updates to reflect second data release |
5.0 | 16/03/2021 | Updates to reflect third data release |
6.0 | 17/08/2021 | Updates to reflect fourth data release |
7.0 | 22/09/2022 | Updates to reflect fifth data release |
8.0 | 31/08/2023 | Updates to reflect sixth data release |
9.0 | 09/07/2024 | Updates to reflect release 6.1 |
10.0 | 28/11/2024 | Updates to reflect seventh data release |
1. Introduction
The Copernicus Climate Change Service Collection and Processing of In Situ Observations service provides brokered access to global historical holdings of surface meteorological observations. It builds upon existing national, regional, and global efforts to create an augmented set of quality assured holdings which can be subsequently used to create a multitude of datasets, products, and services. This document contains all relevant information for a user of the land data holdings to be able to discover, understand, and use the data appropriately. The document is structured as follows:
- Section 2 provides a quick access user guide summary of the land database; how to access the data, concise details of the data columns and column contents with an example of the data output provided. The summary of this section provides details of the current version and planned updates.
- Section 3 outlines at a high level, with reference to further materials, the principal historical and present methods of land meteorological observation.
- Section 4 outlines all archived data sources held, the directory structure of the land-based data inventory and provides links to detailed inventory information.
- Section 5 presents ECVs selected for inclusion and justification.
- Section 6 outlines the Common Data Model (CDM) employed by the service and the method by which this is realised for the land holdings.
- Section 7 provides an explanation of the approach undertaken to merging sources.
- Section 8 describes the Quality Assurance and Quality Control procedures applied.
- Section 9 presents details of the harmonized holdings including aspects such as global and regional completeness by variable, timestep and over time.
- Section 10 summarises provides full details on the current data release.
- Section 11 describes how users can provide feedback on the services provided and any data issues they find.
The user will find more detailed information in the following documents:
- Station data inventories: available from Table A in the Appendix.
- Citations to be used, by source: available from Table B in the Appendix.
- The Common Data Model for in-situ observation data holdings – the latest version of which is available via the documentation tab on the CDS catalogue entry.
Additional information can be provided upon request (see section 10).
2. Quick access user guide summary of the land database
2.1. Accessing the data
The data can be accessed and downloaded via the Copernicus Climate Change Service (C3S) Climate Data Store (CDS). Users will need to have registered/logged-in, and accepted the terms of use in order to be able to submit the final data download request form.
The page has 3 tabs as illustrated in Figure 1.
Figure 1: Screenshot of the dataset entry tabs.
The Overview tab presents an overview of the land data and provides details on the data description, main and related variables available, contact email and links to license / data policy statements.
The Documentation tab provides links to the present Land User Guide, the Common Data Model, and a link to the Data Deposit Server webpage.
The Download tab provides several boxes where specific selections need to be ticked, as follows:
- The first box has options for the Version. It is mandatory to choose one of the versions proposed. If several versions are proposed, it is recommended to choose the latest version. This choice typically changes the list of dates available for download, as newer versions generally tend to extend the temporal extent of the dataset.
- The second box has options for the Time aggregation (monthly, daily, sub-daily): only one selection is possible, this choice typically controls the list of variables available for download.
- The third, fourth, and fifth boxes have options for the year, month, and day (respectively). Note it is possible to select multiple choices for the month and day.
- The sixth box has options for geographical area selection (by default, no geographical restriction is applied, data found for the entire globe will be returned unless otherwise specified).
- The seventh box has options for the format of the data retrieved: NetCDF is the default, but Comma-Separated-Value (CSV) can be chosen instead.
Figure 2 shows an example of selection for the top part of the download form. After selecting 'all variables' that can be downloaded for the monthly aggregation, the selection of a particular year (1782) causes two of the variables (fresh snow and wind speed) to be greyed out. This indicates to the user that such data are not available for that year, however, air temperature and accumulated precipitation are available.
Figure 2: Screenshot of the upper part of the download tab.
Once a selection has been made, and provided the user is logged in, the first query will be ready to submit, requiring to review the terms of use. If these are accepted, then it is possible to select 'Submit form'. An example is shown in Figure 3.
Figure 3: Lower part of the download tab, once (1) a complete selection has been made, (2) the user has logged in, and (3) and the terms of use have been accepted.
If a user would like to use the Application Programming Interface (API) in the CDS (Please go to the documentation page for information as to how to use the CDS API) there is the option to view and copy the current data API request by clicking on "Show API request code" at the bottom of the page (Figure 4).
Figure 4: As above, after clicking on "Show API request code". In this example, monthly data for precipitation and air temperature are requested for the whole year 1782, for version 2.0.0 of the dataset, in NetCDF format.
Once the data are returned, they can be inspected with a spreadsheet software (if in CSV), or using other processing solutions to load the data into more advanced data structures.
One example is shown in Figure 5, loading the data into a pandas data frame in python, and then inspecting the first 4 entries.
Figure 5: Example of instructions in python to read the data. This example shows, for the month of January 1782, from left to right, reports for maximum, minimum, and mean temperature for Milan, and precipitation for Hohenpeissenberg (respectively). Please refer to the Overview tab for a succinct description of each element, and to section 2.2 for more details.
Section 2.2 provides details of each element field in the downloaded data. Users are instructed to give particular attention to the following two elements, contained in the data returned. First, the data policy license informs about conditions of use (see this table for the correspondence). Second, the source identifier informs about the data product and/or provider (sometimes including one or several citations) that need to be acknowledged when using the data (see Table B in the Appendix). It is important that users follow these requirements to ensure usage is commensurate with licence conditions and that proper acknowledgement is given to the ultimate data rights holders.
2.2. Concise explanation of the data columns and brief description of data column contents
The harmonised data holdings need to be presented in a usable and consistent manner. It is the desire of C3S that this delivery be consistent across the suite of in-situ data activities to the extent possible and practicable. To that end this service contract and previous contracts have led the development of a common data model (CDM) (distinct from a data format) that allows a consistent representation of the data and associated metadata (See Section 6 for full details of the CDM).
The data for current release which is available via the CDS is being served in a condensed version of the CDM tables. The CDM-OBS-Core is a deliberate subset of the CDM-OBS intended to be stored and served as a single table data model to users via the new CDS platform. The underlying ethos is to minimize to the extent possible the number of fields to be stored and to ensure homogeneity of data provision while recognizing that some irreducible heterogeneity exists in the in-situ data being served that needs to be catered for appropriately. Table 1 presents details of the CDM-OBS-CORE format for land based in situ observations.
Table 1: Field name, data type and description of the different elements and original table where information was extracted for the CDM-OBS-CORE
Element grouping | CDM-OBS-CORE Element | Type | CDM-OBS primary Table | Description |
Identifier information | station_name | varchar | Station configuration | The station name (where station can mean a physical station, a ship, a buoy or any other observing platform) |
primary_id | varchar | Station configuration | The primary station identifier for the station / platform from which the observation arises. Where the station has one or more WIGOS Station Identifiers (WSIs) this should be the primary WSI associated | |
report_id | varchar | Header | Report identifier unique per report (collection of observations) | |
observation_id | varchar | Observations | Unique observation identifier per observation | |
Location information | longitude | numeric | Header | Location of instrument at time of observation (identical to entry in station configuration table for fixed assets). |
latitude | numeric | Header | ||
height_of_station_above_sea_level | numeric | Header | ||
Temporal information | report_timestamp | Timestamp with timezone | Header | Date timestamp including timezone. The default for presentation of data via C3S should be that all data have been converted to UTC. |
report_meaning_of_time_stamp | int | Header | Whether the timestamp refers to beginning, middle or end of reporting period | |
report_duration | int | Header | The duration of the report | |
Observation value information | observed_variable | int | Observations | The variable being observed defined by a numeric identifier |
units | int | Observations | The units associated with the observed variable | |
observed_value | numeric | Observations | The observed value | |
Quality information | quality_flag | Int | Observations | The quality flag for the observation |
Source information | source_id | Varchar(pk) | Source configuration | Data source identifier – for provenance. If mixed source collection must be a data column. |
data_policy_licence | int | Source configuration | Data policy per observation. | |
Type of observation | report_type | Int | header | Type of observing platform, report or instrument for applications with mixed holdings where in CADS user subsetting may be advantageous |
value_significance | int | Observations | An indicator of what the value signifies (mean, median, max, min etc.) |
2.3. Details of current version, details of planned schedule of updates and what is new in this version.
The current release is the full seventh data release and was completed in November 2024. The following section provides a summary of information on the current data release and section 9 provides more in-depth details.
2.3.1. Sub-daily data
The current public data release consists of 29,385 unique stations, an increase of 4260 unique stations over the prior data release (r6) in June 2024. Figure 6 presents the locations all sub-daily stations in the current release, November 2024.
The change in temporal coverage from the previous public data release to the current public data release per variable type is as follows:
Variable | Previous R6 | Current R7 |
Pressure | 1790-2023 | 1790-2024 |
Temperature | 1790-2023 | 1790-2024 |
Water Vapour | 1892-2023 | 1847-2024 |
Wind | 1821-2023 | 1863-2024 |
Note that changes in start dates may relate to release version to release version differences in merging of stations or their quality control leading to some differences in stations served under distinct ids. This particularly affects early station record inclusion / exclusion decisions. All such decisions are revisited each new release to ensure the longest records that can be served with confidence based upon state-of-the-art understanding at the time are made available. The period of record start dates are driven by decisions regarding a handful of stations. Meaningful global coverage is only attained much later.
The stations in the current release are distributed by ECVs as follows:
- 26,320 sub-daily stations consisting of Temperature observations (increase of 6,226 stations on previous release).
- 17,998 sub-daily stations consisting of Wind direction and Wind speed observations
- (increase of 2,146 stations on previous release).
- 22,179 sub-daily stations consisting of Water vapour observations (increase of 4,462 stations on previous release).
- 19,818 sub-daily stations consisting of Sea Level Pressure observations (increase of 1,582 stations on previous release).
- 18,567 sub-daily stations consisting of Station level Pressure observations (increase of 3,131 stations on previous release).
The stations in the current release are distributed by WMO Region as follows: - 1,795 sub-daily stations located in WMO Region 1 (Africa).
- 3,953 sub-daily stations located in WMO Region 2 (Asia).
- 1,781 sub-daily stations located in WMO Region 3 (South America).
- 7,167 sub-daily stations located in WMO Region 4 (North America, Central America and the Caribbean).
- 1,378 sub-daily stations located in WMO Region 5 (South-West Pacific).
- 13,112 sub-daily stations located in WMO Region 6 (Europe).
- 195 sub-daily stations located in WMO Region 7 (Antarctica).
Figure 6: Map showing locations of sub-daily stations in the current release
The sub-daily data for the current data release consists of data arising from 118 different sources which have been merged and reconciled. So far, we have been able to verify that 82 sources have an open data policy (Data in public domain and freely available (no cost and unrestricted)), 23 of the sources have an open Creative Commons (CCBY) License (https://creativecommons.org/licenses/) and 28 sources are deemed to be WMO Unified Data Policy Additional Data as these data are openly available from publicly facing repositories, but data policy could not be verified at this time. Of these sources 15 have mixed data policies due to known national data policies being applied. The data policy breakdown by stations for the current data release is as follows:
- 19,465 stations with observations under open data policy / open CCBY license.
- 12,450 stations with observations under WMO Unified Data Policy.
Of these stations 2,530 have a mixed data policy due to being merged from different sources with varying data policies.
2.3.2. Daily data
The daily data for the current release consists of a subset of stations extracted from NOAA's National Centre for Environmental Information (NCEI) Global Historical Climatological Network Daily (GHCN-D). The GHCN-D database consists currently of in excess of 120,000 stations, although many of these are precipitation only stations. Given the stated aim for a multivariate set of holdings, the current release does not include the precipitation-only stations. The subset of 86,513 global daily stations were selected from the GHCN-D dataset based on the availability of at least two of our target ECVs. This represents a slight decrease of 32 stations on the previous release owing to decisions made on station merging and processing by NCEI in the interim. Figure 7 presents the locations all daily stations in the current release.
The daily station temporal coverage per variable is as follows:
- Precipitation 1763-2024
- Snow 1840-2024
- Temperature 1763-2024
- Wind 1881-2024
The daily stations in the current release are distributed by ECVs as follows:
- 86,458 daily stations consisting of Precipitation observations.
- 70,998 daily stations consisting of Snowfall observations.
- 62,436 daily stations consisting of Snow Depth observations.
- 50,754 daily stations consisting of Temperature observations.
- 20,768 daily stations consisting of Snow Water Equivalent observations.
- 1,197 daily stations consisting of Wind Speed observations.
- 82 daily stations consisting of Wind Direction observations.
The daily stations in the current release are distributed by WMO Region as follows:
- 689 daily stations located in WMO Region 1 (Africa).
- 1299 daily stations located in WMO Region 2 (Asia).
- 446 daily stations located in WMO Region 3 (South America).
- 76,087 daily stations located in WMO Region 4 (North America, Central America and the Caribbean).
- 2,005 daily stations located in WMO Region 5 (South-West Pacific).
- 5,942 daily stations located in WMO Region 6 (Europe).
- 52 daily stations located in WMO Region 7 (Antarctica).
Figure 7: shows the location of all the daily stations in the current release.
The current daily data release consists of 30 different merged sources. So far, we have been able to verify that 19 sources have open data policy or CCBY data policy (Data in public domain and freely available (no cost and unrestricted)) and 11 sources are deemed to be WMO Resolution 40 Additional Data (openly available from publicly facing repositories), but data policy could not be verified at this time. There are 5 of the daily data sources with mixed data policy.
The data policy breakdown by stations for the current data release is as follows:
- 74,926 stations with observations under open data policy / open CCBY license.
- 11,629 stations with observations under WMO resolution 40.
- 40 of these stations have mixed data policy.
2.3.3. Monthly data
The same selected subset of GHCN-D daily stations was extracted from NCEI's Global Summary Of the Month (GSOM) dataset. There are 83,128 monthly stations in the current release (Figure 8), this represents a small increase of 37 stations on the previous data release.
The monthly station temporal coverage per variable is as follows:
- Precipitation 1763-2024
- Snow 1840-2024
- Temperature 1763-2024
- Wind 1881-2024
The stations are distributed by ECVs as follows:
- 82,936 monthly stations consisting of Precipitation observations.
- 37,148 monthly stations consisting of Temperature observations.
- 65,886 monthly stations consisting of Snow observations.
- 1,202 monthly stations consisting of Wind Speed observations.
The stations are distributed by WMO Region as follows:
- 658 monthly stations located in WMO Region 1 (Africa).
- 1,294 monthly stations located in WMO Region 2 (Asia).
- 423 monthly stations located in WMO Region 3 (South America).
- 72,774 monthly stations located in WMO Region 4 (North America, Central America and the Caribbean).
- 1,996 monthly stations located in WMO Region 5 (South-West Pacific).
- 5,933 monthly stations located in WMO Region 6 (Europe).
- 50 monthly stations located in WMO Region 7 (Antarctica).
The breakdown for monthly stations data policy is similar to the daily stations.
Figure 8: Maps shows locations of monthly stations in the current release
3. Comprehensive user guidance
3.1. Overview of historical and present methods of observation
Observations of core meteorological variables such as air temperature, surface pressure and rainfall have been made routinely since around 1850, with some records going as far back as the 1600s. Initially, such observations were made for such purposes as general scientific or medicinal interest, farming, and shipping. By the late-1800s, facilitated by the
International Meteorological Organization (now the World Meteorological Organization, WMO), meteorological observations were made and exchanged across all inhabited continents. These networks have grown considerably over time to the thousands of stations that we have today, transmitting a wide variety of variables on hourly or even sub-hourly timescales across the global telecommunications system (GTS). The following sections provide a broad-scale and non-exhaustive overview of changes to orient the user.
3.2. Changes to the land meteorological observing system
Despite a guide to standards being in place (WMO, 2014a and several antecedents since the early 20th Century) there has been and continues to exist a wide variety of instrumentation, instrument housing, station environments and siting, and observing methods employed across the world and even within countries. The vast majority of observations have been made at stations where the key interest is forecasting the near-term weather. Hence, by necessity, there have been frequent and widespread changes to these observing methods, timing, instruments and their housing, and station environments and locations over time, as opposed to a focus on long-term continuity and stability.
Instruments have evolved considerably over time with improvements in performance, reliability, convenience and cost of operation. Mercury-in-glass thermometers were commonplace but have now largely been replaced both for health reasons and because of the ready availability of cheap and reliable electronic sensors. Similarly, the psychrometers used to measure humidity were originally principally a paired dry-bulb and wet-bulb thermometer with a reservoir of water to maintain a wetted wick. These have largely been replaced with cheaper and lower maintenance resistance or capacitance sensors where no reservoir or wick needs to be managed. The design of rain gauges has developed considerably, with the current generation of gauges using buckets that automatically tip at designated intervals. Cloud cover can now be measured electronically, but the snap-shot view from Ceilometers is inconsistent with a full sky view of a manual observer. Similar examples of changes in instrumentation exist for all other observed parameters.
There have also been substantive changes in instrument housing. Early thermometers were often housed in semi-open boxes or cages attached to poleward facing walls or screens, under verandas, or thatched canopies in the tropics, to shield from direct radiation. By the early 20th Century stand-alone screens with louvred sides (e.g., Stevenson Screens) were generally used (Parker, 1994), reducing the influence of the thermal mass of the attached wall or stand or surrounding ground, and hence providing a better measure of the 'true' ambient air temperature. More latterly with automation these in turn have tended to be replaced with smaller housing.
Early records were manually recorded observations. As technology has evolved automated observations that do not require human intervention have become possible and broadly adopted. Automatic weather stations (AWSs) provide the convenience of higher frequency observations and can also lead to increases in observation density, as costs are lower than for manual observations. The widespread introduction of AWSs took place largely during the 1990s, although this differs considerably from country to country. This change often meant simultaneous changes in instrumentation, observing times and frequencies, instrument shelter and sometimes instrument location across large swathes of observing networks. For example, in the USA there was a large-scale change in the 1980s from the liquid-in-glass thermometers in wooden screens (Cotton Region Shelters) to semi-automated thermistor based Maximum Minimum Temperature Systems (MMTSs) which utilised smaller plastic shelters (Quayle et al., 1991; Lawrimore et al., 2011).
Times of observations have changed both systematically and periodically for a wide variety of reasons. The frequency or hours of readings may change over time to meet evolving user needs. For example, in the USA, there has been a move towards morning readings of the maximum and minimum thermometers during the 20th Century as opposed to sunset readings. Such changes can significantly affect the daily and monthly statistics of the maximum, minimum and mean temperatures and the continuity of hourly observations. This US example has been found to have introduced a small cool bias (Jones, 2016).
Arguably the most ubiquitous changes to observation records are the location of the instruments within a station, the location of that station, or the environment surrounding and within the station. A move of only a few metres is sufficient to lead to differences in ambient climate, as are changes in nearby vegetation, ground surface, buildings, water bodies or irrigation practices. Systematic moves and changes have taken place throughout the historical record. The most commonplace of these is the move of stations to airports and the growth of urban areas to surround once rural stations. However, there are a broad range of such changes, the effects of which vary greatly on a station-by-station basis.
3.3. Types of observing networks over land
There are many different types of observing sites and networks across the globe with varying standards and levels of formality and coordination. The highest quality sites are those maintained specifically for long-term records, with the standard requirements defined in WMO (2014a) met, frequently serviced and calibrated instruments, duplication of instruments to ensure resilience, well maintained documentation (metadata) and parallel records during any period of change. The lowest quality of sites may have poorly maintained instruments, poorly sited instruments close to artificial sources of heat or moisture, no redundancy or parallel records and little or no documentation. Unfortunately, there exist very few of the former sites and very many of the latter. In data dense regions the data issues in poorer quality records can easily be identified, quality controlled or even adjusted by comparison to nearby neighbouring records so long as there are not network-wide contemporaneous changes of a similar nature. In data poor regions it is much harder to assess and address issues of data quality. Many stations have short records due to changes in station locations or network resourcing and may prioritise real-time accuracy over long-term continuity for weather forecasting purposes.
A range of different network types aim to ensure that the global observing system meets the myriad of needs put upon it (GCOS, 2015; Thorne et al., 2018). There exists a comprehensive network of ~11000 WMO reporting stations across the globe in addition to networks run under other realms such as agriculture or transport (most notably civil aviation (ICAO) sites). The emphasis on the comprehensive network is quick access, high spatial and temporal resolution and general quality rather than long-term continuity. The WMO stations also contribute to smaller baseline networks with emphasis on long-term operational commitment and global distribution. The Regional Basic Synoptic Network (RBSN) contains ~4000 stations the Regional Basic Climatological Network (RBCN) contains ~3000 stations, both of which are subsets of the WMO network. There are substantial overlaps between these two networks. The RBCN network stations include ~1,000 that are also part of the GCOS surface network (GSN; GCOS, 2010) which prioritise uniform spatial coverage and longevity. It is the RBCN stations that are requested to provide monthly, and soon daily, CLIMAT messages of averages, extremes and threshold exceedances for temperature, precipitation and sunshine duration (WMO, 2014b). Finally, there are reference networks which are the sparsest in terms of coverage but make the highest-quality observations with metrological traceability. The US Climate Reference Network (USCRN): https://www.ncdc.noaa.gov/crn/ is the most notable such network at the present time. These USCRN data will be made available via the CDS under a dedicated catalogue entry including a full metrological uncertainty characterisation by C3S2 311 Lot 2. GCOS is actively scoping the potential development and deployment of a global surface reference network at this time and C3S2 311 Lot 1 participants are involved in this effort.
3.4. Observing key variables of the atmosphere
Each meteorological variable observed has its own suite of instrumentation, standard units, dependencies and vulnerabilities to error. Calibration errors and mistakes made when recording, converting, reporting or digitising measurements are commonplace. There are various methods used to detect such random errors. Each variable, by its nature, can be affected by different features of the measurement environment and simultaneous weather.
Air temperature measurements can be affected by both reflected and emitted radiation from surrounding surfaces, even within a screen. The level of nearby vegetation and water (bodies, irrigation) can also affect the local temperature through changes in latent and sensible exchanges. Hence, year-to-year changes in the character of the local environment can lead to local changes in temperature that are also non-climatic in origin but may be correlated with covariates such as rainfall or snowfall.
Surface pressure is largely robust to errors, but conversion to sea level pressure can be problematic at high altitudes and conversion algorithms differ, often quite substantively. Although calibration is easy, sometimes it is not undertaken.
Measurement of precipitation, be it rain, hail or snow, can be affected by the wind speed, with strong winds leading to under- or over-catch within the gauge depending upon precipitation phase and gauge design. Evaporation can also be an issue, and this is enhanced by wind and sunshine. The duration of each measurement is critical additional information. Measurement of solid phase precipitation is particularly challenging. The spatial representativeness of precipitation measurements, particularly in regions or seasons dominated by convective precipitation is relatively small.
Wind speed is very strongly affected by surrounding obstacles and, like precipitation, is strongly dependent of the duration of the measurement. The maximum gust speed can be very different to a period average speed. Similarly, instantaneous direction can differ substantively from time-averaged direction. Wind speed close to the surface is highly influenced by the local surface characteristics that may vary both seasonally and in the long-term.
Humidity is often a derived quantity, and methods differ depending on the instrument type or local practice. The initial measurement may be from a paired dry- and wet-bulb thermometer (psychrometer), a dewcell which measures dew point temperature directly, or a resistance or capacitance sensor which provides a measure of relative humidity. It is often the dew point temperature or relative humidity that is reported and derivations are dependent on simultaneous measurements of temperature and pressure. Ventilation is very important for measuring humidity. In very still conditions a humidity sensor that is not artificially ventilated can be biased high. Very cold conditions also pose problems for measuring humidity, particularly if using a psychrometer. Icing of the screen can inhibit air flow to the instrument. Freezing of the water reservoir may mean that the wick around the wet-bulb dries out, becoming effectively a dry bulb. The relationship between humidity quantities derived from the wet-bulb or dew point temperature (e.g., vapour pressure, specific humidity etc.) differs depending on whether the wick is wet (wet-bulb) or frozen (ice-bulb), and so it is important that the correct algorithm is used.
3.5. Historical data rescue
It is estimated that there are as many, if not more, data remaining undigitized in the period prior to 1950 as exist in digital archives; and in many cases these remain uncatalogued. In some instances, documents are lying in long forgotten archives or personal collections, in varying states of physical decay. These data would make an invaluable contribution to our understanding of the world's climate and how it has changed over the last 200+ years. Periodically, as resources become available, some records are digitised but this is a time consuming and expensive process. Much of this work takes place ad-hoc through regional projects. The overarching body of ACRE (Atmospheric Circulation Reconstructions over the Earth; Allan et al., 2011; http://www.met-acre.net/) have made huge contributions to this effort, making vast amounts of land and marine observations digitally accessible for climate research. The Copernicus Climate Change Service (C3S) is building a Data Rescue Service under this C3S2 311 Lot 1 contract to facilitate and coordinate global data rescue activities and ensure flow through to global archives such as the C3S Global Land and Marine Observations Database. The data rescue portal can be found at https://datarescue.climate.copernicus.eu/.
4. Data sources
The land data holdings have historically been exchanged for any given station at one or more of sub-daily (synoptic), daily or monthly averages. Typically, the basic measurements have been made at sub-daily or daily timesteps and then aggregated to coarser temporal resolutions.
Historical data management practice has been highly fractured by one or more of: Essential Climate Variable (ECV), timescale, and often nationally / regionally. Numerous stations have been shared multiple times often using distinct geolocation data, station names, or processed versions. Data have often been shared for a specific timescale and / or variable, leading to a fracturing of data holdings that requires substantive work to unpick and resolve. This can greatly complicate identifying which are truly unique stations. A necessary first step in the creation of an integrated set of historical holdings is thus simply to collate and catalogue what data are available. The initial cataloguing effort has been undertaken separately for each timestep and is summarised here. We stress that far from all of these data are included within the current release. Over successive iterative releases efforts shall be made to pull through additional sources but, even if successful, very many sources shall remain to be merged and harmonized at contract finalisation. This is due to both the complexity of the challenge and the fact that new sources continue to be acquired on a continuous basis as data are rescued and data policies continue to evolve, generally, to become more open.
While every effort has been made to comprehensively collate the available holdings it is all but guaranteed that potential sources shall have been missed. An in-depth inventory of the sources and the detailed station level inventory are made available at the new data deposition web-based service available at https://datadeposit.climate.copernicus.eu/home/. This C3S2 311 Lot 1 service currently runs on a number of servers hosted on the Irish national computing infrastructure, the Irish Centre for High End Computing (ICHEC), hosted by the National University of Ireland Galway (https://www.ichec.ie/). They host all the services data and all processing will be performed on the KAY supercomputer (kay.ichec.ie). The Data Deposition Service facility represents a disk area distinct from the main ICHEC compute resource which enables upload of data via a range of means including user-initiated push and service-initiated pull. Data can be safely checked and inventoried in this area to ensure against viruses, malware and data corruption prior to ingestion into the C3S2 311 Lot 1 database on the ICHEC servers. The Data Deposition Service enables data providers a space for them to share their data with the C3S2 311 Lot 1 service and offers a means for data and metadata upload to a secure server (see Noone et al., 2020). The C3S2 311 Lot 1 service are looking for all relevant data in any format and this data can now be uploaded at: https://datadeposit.climate.copernicus.eu/accounts/login/.
Also, if users have knowledge of a data source which they believe is publicly available and is not currently catalogued please email: incoming_data@surfacetemperatures.org or arise a ticket on the C3S service desk to advise the service of the potential new source for inclusion.
The EUMETNET-Copernicus data sharing agreement offers an opportunity to gather additional data arising from European NMHSs. Work has been ongoing across Copernicus Services to secure and process data under this agreement, although significant work remains to be done. We have received responses from several European NMHSs and Institutes indicating a willingness to share data. We also work with Copernicus to secure holdings via various cooperation agreements entered into with international entities. To date these have secured data additions for countries such as Brazil, Chile and Canada.
The following sections provide information on the inventory format and structure and present an overview of the data inventoried so far. We also present inventory graphical summaries for each land data source that can be downloaded from this attachment.
These graphical summaries document provide maps showing station geolocation by each data source by timescale and some metadata information such as:
- Data source unique identifier
- Dataset name
- Data access policy
- Data start and end years
- Timestep available
- Variables available
4.1. File structure of the land-based source deck inventory
As outlined in this attached report, the land-based meteorological data inventory consists of a single parent file which contains information on all archived data decks and subsequent station level decks (sub-daily, daily and monthly). The following section describes the format of the source deck inventory, which is compliant with the ISO19115, INSPIRE and WMO/WIGOS standards.
The source deck inventory has 29 columns. All column headers and descriptions are presented in
Table C in the Appendix. The format overall structure is ordered as follows:
- The first column shows the inventory version for e.g. preliminary or V.01 etc.
- The next 12 columns provide general details on the source deck such as unique identifier, source name, data repository and method of data transfer along with the domain data coverage and geo-coordinates bounding box (maximum and minimum Latitude/Longitude). Column 4 is an addition, which provides a dataset name allocated to the data source and is on the recommendation of Gil Compo from the Service's Science advisory panel. This dataset name is a combination of the data source name, the data repository where data was obtained and the domain which the data covers e.g. amrc_palmer_isti_antarctic.
- The source deck inventory in column 14 provides information on the source deck data policy.
- Column 15 in the inventory provides an overview of the variables under current consideration by the Service which are available.
- Other variables that are potentially available from the source but not yet considered are presented in column 16.
- Columns 17-20 contain information on the data timestep, data start and end years and mean data years.
- Columns 21 and 22 show details of the source data update status and the processing status of the data within the C3S 311a Lot 2 activities.
- The final 7 columns provide data source point of contact information and are taken from the ISO19915 metadata standards table.
4.2. Station deck inventory format
We have produced a station deck inventory format that is compliant with international discovery metadata standards. The design principle of this inventory is such that it deliberately contains a "super-set" of a range of international metadata standard compliant formats such that it can be extracted in a range of standard-compliant formats. The station deck (sub-daily, daily and monthly) inventories thus provide detailed information on a per timestep basis. The station deck inventories each consist of 116 columns that can potentially contain comprehensive metadata information. These columns follow a logical structure:
- The first 10 columns provide core information per station such as the station identification number, original data source name and unique identifier, WMO(WIGOS) station identifier, station latitude/longitude and elevation in meters above sea level. Column 5 is an additional column that contains the data set name as described in section 3.1.
- The next 3 columns show the Federal Information Processing Standards (FIPS) country code, country and continent location of the station data.
- The station data access and usage policy information are shown in the column 14. Note that data policy may vary by station within each source deck.
- Information is next provided on the data type which may be raw data, quality checked or homogenised by original source.
- Columns 16 to 18 contain the station data update status and (if available) links to the station data metadata and data timestep are provided.
- Other information such as the frequency of the observation (for sub-daily data) and the time the observation (UTC) was recorded (for daily data) are shown (if available). The variables available for each station have separate columns 22-53 which show the specific variable data start and end years.
- Columns 54-116 contain additional comprehensive metadata which creates a "super-set" inventory of required information based on ISO19115 and WMO core profile (WIGOS) metadata standards.
The ISO19115 standard is part of the ISO geographic information suite of standards (19100 series). ISO19115 provides information about the identification, extent, quality, spatial and temporal aspects, content, spatial reference, portrayal, distribution, and other properties of digital geographic data and Services. ISO19115 Geographic Information Metadata is the most widely used standard for geospatial metadata. The ISO19115 format is flexible, which allows organizations to customize the metadata profile for any specific application, but in doing so still retain the main features and advantages.
The following categories make up a core ISO 19115 profile:
- Title
- Reference date
- Responsible Party
- Geographic location
- Language
- Character set
- Topic Category
- Scale
- Abstract
- Format
- Extent
- Representation Type
- Reference System
- Lineage
- On-line Resource
The C3S 311a Lot3 activity chose the ISO19115 metadata standard, as have other major EU and International projects. For example, the NASA Earth Science Division (ESD) Base Metadata Requirements make use of an ISO19115 metadata profile for NASA Earth science data, the Infrastructure for Spatial Information in the European Community (INSPIRE) metadata profile is built upon EN ISO19115 and the WMO Core (WIS) (WMO Information System formerly known as the Global telecommunications System (GTS)) is also a profile of ISO19115. This "super-set" inventory approach avoids us picking a "winner" from a variety of standards but allows a user to extract metadata information in a variety of standard compliant formats that meet the variety of user requirements. In future capabilities to do so may be able to be integrated into CDS functionality. Table D in the Appendix shows the station deck inventory format. The first column shows the column number for reference, the second column shows station deck "superset" inventory table fields. The third column shows the corresponding ISO19115 standard compliant table fields which are extracted from the corresponding column two field. The WIGOS standard compliant fields are shown in the fourth column and are also extracted from the corresponding column 2 field.
4.3. Sub-daily data inventory
There has been considerable progress made in inventorying new data sources at the sub-daily timestep (Table 2). The current sub-daily station inventory as of 30th June 2024 consists of 198 data sources and 151,881 stations in 200 different countries, territories and dominions, although very many of these stations are duplicated (often multiple times) across one or more sources such that the true station count will be considerably lower following reconciliation.
Table 2: Number of sub-daily stations and sources inventoried for each year since 2017.
Year | Total Number | Number of stations increased since previous year | Total number | Number of sources increased since previous year |
31/08/2017 | 814 | - | 13 | - |
31/08/2018 | 23,619 | 22,805 | 51 | 38 |
31/08/2019 | 98,683 | 75,064 | 123 | 72 |
31/08/2020 | 127,681 | 28,998 | 133 | 10 |
30/06/2021 | 132,655 | 4,974 | 149 | 16 |
30/06/2022 | 143,608 | 10,953 | 175 | 26 |
30/06/2023 | 144,119 | 3,030 | 186 | 11 |
30/06/2024 | 151,881 | 5,688 | 198 | 12 |
Figure 9 presents a map showing locations of the sub-daily stations inventoried as of 30th June 2024. Figure 9 also shows the number of sub-daily stations currently inventoried that are operational by year over the period 1750-2023.
However, as noted above many of these sub-daily stations are duplicated across multiple data sources. The following climate variables are generally available at the sub-daily timestep.
- Temperature
- Precipitation
- Mean Sea Level Pressure
- Water Vapour (generally as dewpoint temperature)
- Wind
- Snow
Figure 9: Map shows locations of the sub-daily stations inventoried as at 30/06/2024. Plots show the log number of apparent stations (prior to identification of duplicates) operational by year for four main climate variables 1750-2023.
Several datasets have been deposited via the Data Deposition Portal (https://datadeposit.climate.copernicus.eu/accounts/login/) since it became operational in 2019. The majority of these datasets derive from data rescue projects and contain historical observations usually located in data sparse regions. Oftentimes they add to the temporal coverage of existing stations; therefore we have been prioritising these datasets for inclusion in each new data release.
Thanks to the Copernicus International Exchange Agreement (https://www.copernicus.eu/en/international-cooperation-area-data-exchange) we have acquired hourly data from the Chilean Met Services for 70 stations with temperature, precipitation, wet bulb temperature , sea level pressure, relative humidity, wind direction and wind speed from 1952-2021. In addition, with help from Henrik Steen Andersen of the EEA we have had several discussions with Environmental Climate Change Canada (ECCC) and have received an initial delivery of hourly data consisting of over 2000 stations with temperature, precipitation, wet bulb temperature, sea level pressure, relative humidity, wind direction and wind speed from 1952-2021. More recently our colleagues at NCEI have set up an update mechanism for the hourly ECCC data so the data is up to date in June 2024. Since 2017 we have also acquired and inventoried new data from the Brazilian Met Service, again brokered via EEA under a Copernicus agreement. This source consists of hourly data for 802 stations from 1904 to 2021 containing observations for pressure, temperature, precipitation, wind and humidity.
Since June 2023 we have added a further 12 data sources to the sub-daily inventory. With the assistance from our partners at KNMI we have acquired historical hourly data for Israel which consisted of data with 8 variables spanning from 1960's to 2000's. We also acquired important sub-daily observations for Greenland from the PROMICE and THREDDS projects consisting of temperature, pressure, wind and humidity from 1990-2024. We also received some rescued data for Congo spanning 1908 to 1965 from the University of Giessen. We also acquired and inventoried data from the Icelandic Met Office, CARRA-Iceland project data consisting of 401 stations with observations of pressure, temperature, precipitation, humidity, wind, cloud and visibility from 1997 to 2023.
A major addition to the inventory was all the Meteo France hourly archive consisting of 3394 station across France, Pacific and Atlantic islands and territories. The data set contains observations of pressure, temperature, precipitation, humidity, wind, cloud and visibility spanning 1777 to 2024. We also received some more historical rescued data for the UK from Ed Hawkins along with some existing source data updates. We added all the hourly data archive from the Polish Met Service (recently made open access) consisting of 67 stations in Poland across 8 variables from 1966-2023. In addition, the hourly data from the Belgian Met Service (16 stations 1952-2023) has been acquired and added to the inventory. Two major Nordic country sources have also been acquired. Firstly, we acquired all the Met NO (Norway Met Service hourly archive) with 709 stations and 8 variables from 1780-2024. Secondly we acquired and inventoried all the SMHI Swedish Met Service hourly archive, consisting of 1026 stations, 8 variables and spanning 1780-2024. We also added some pressure and temperature data for a station in Murchison Bay, Svalbard, 1957-1959 which was acquired from the Royal Swedish Academy of Sciences and digitized by B.-M. Sinnhuber.
We also ran another iteration of the CliDaR project with the 2023/2024 cohort of 2{^}nd^ year Geography students at Maynooth University. These students successfully digitized over 30-years of unique data (approx. 360,000 observations across 8 variables) for one station in Guinea and 4 stations in Madagascar over the 1950's-1960's. The team have also ran another data rescue project with some Maynooth 3rd year undergraduate geography students who participated in the pilot CliDaR-Africa project in 2022. The C3S2-311 service received some funding from the World Meteorological Organisation (WMO) for data rescue. Hence, we employed 26 of these students who digitised over 46 years (approx. 550,000 observations) of unique sub-daily observations consisting of temperature, precipitation, humidity, pressure, wind, and cloud variables across 6 stations spanning the years 1948-1960. The newly digitised data for this existing source has also been added to the current inventory. We also published a paper in the Geoscience Data Journal to introduce the Climate Data Rescue Africa (CliDaR-Africa) project and encourage other institutes to at look incorporating the CliDaR-project into their curriculum. The paper can be found at: (http://doi.org/10.1002/gdj3.248).
The 198 original data sources in the sub-daily station inventory currently have the following data usage policy:
- Sub-daily data from 137 original sources have an open data access policy.
- Sub-daily data from 11 original sources have a Creative Commons Attribution licence data policy.
- Sub-daily data from 50 original sources have stated or implied WMO Resolution 40 data policy. Resolution 40 permits personal or research usage but explicitly forbids commercial exploitation. Note that Resolution 40 has been superseded by the new WMO unified data policy.
- Sub-daily data from 6 original sources have a mixed data policy due to having stations located in many different countries across the world each with their own National data policies.
Out of an abundance of caution where the data policy is unclear a Resolution 40 status has been assumed. It will be possible to verify open policies on at least a subset of data presently assumed to only be able to be shared under resolution 40 auspices and work is ongoing to ascertain this at this time. In addition, for countries which have explicitly declared open data policies it is possible to supersede the source-level IPR with the stated national policy and this is done where the policy is unambiguous.
4.4. Daily data inventory
There has not been much progress on the daily data inventory over the past year, as the team have been focusing on acquiring new sub-daily data sources (Table 3). The current daily inventory has 137 sources (increase of 1 source since 2023) and 177,253 stations (which as for sub-daily do not constitute unique sources). Note that 212 of the increase in station count since 2023 derived from additional stations acquired from the China Meteorological Administration which was an existing source.
Table 3: Number of daily stations and sources inventoried for each year since 2017.
Year | Total Number | Number of stations increased since previous year | Total number | Number of sources increased since previous year |
31/08/2017 | 162,892 | - | 91 | - |
31/08/2018 | 173,782 | 10,890 | 122 | 31 |
31/08/2019 | 173,782 | 0 | 122 | 0 |
31/08/2020 | 173,879 | 97 | 125 | 3 |
30/06/2021 | 175,824 | 1,945 | 127 | 2 |
30/06/2022 | 176,788 | 964 | 133 | 6 |
30/06/2023 | 176,980 | 192 | 136 | 3 |
30/06/2024 | 177,253 | 275 | 137 | 1 |
Figure 10 presents a map showing locations of the daily stations inventoried as of 30th June 2024. Figure 10 plots show the number of currently inventoried daily stations operational by year for four main climate variables 1750-2024.
The following climate variables are generally available at the daily timestep:
- Temperature
- Precipitation
- Sunshine Hours
- Mean Sea Level Pressure
- Wind
- Water Vapour (as dewpoint temperature or RH)
- Snow
The data sources in the daily station inventory currently have the following data usage policy:
- Daily data from 37 original sources have an open data access policy.
- Daily data from 11 original sources have a Creative Commons Attribution licence data policy.
- Daily data from 80 original sources have stated or implied WMO Resolution 40 data policy.
- Daily data from 25 original sources have a mixed data policy due to having stations located in many different countries across the world each with their own National data policies.
As is the case for sub-daily data where a national open policy can be ascertained this can be used to overrule a source level IPR stipulation, allowing all data from that country to be made open access.
Figure 10: Map shows locations of the daily stations inventoried as at 30/06/2024. Plots show the log number of stations (prior to identification and reconciling of duplicates) operational by year for four main climate variables 1750- 2024.
4.5. Monthly data inventory
As stated previously we have been focusing on adding sub-daily sources. As of 31st August 2017, we had inventoried 53 monthly data sources (84,668 stations). We added 1 source (514 stations) in 2018 and 31 sources in 2020 (Table 4). The current monthly station inventory consists of 55 data sources and 186,015 station records. All sources known to consist of aggregated underlying sources have been disaggregated where possible. These include the Global Summary of the Month dataset which is derived from the GHCN-D dataset. The GSOM dataset has been added to the monthly inventory since 31/08/2019 and contains 100,829 stations. No new sources have been added since 2020. Figure 11 presents a map showing locations of the monthly stations inventoried as at 31/06/2024 and plots show the number of stations operational by year for four main climate variables (temperature, precipitation, mean sea level pressure and snow measurements) over the period 1750-2024.
Table 4: Number of monthly stations and sources inventoried for each year since 2017.
Year | Total Number | Number of stations increased since previous year | Total number | Number of sources increased since previous year |
31/08/2017 | 84,668 | - | 53 | - |
31/08/2018 | 85,186 | 518 | 54 | 1 |
31/08/2019 | 85,186 | 0 | 54 | 0 |
31/08/2020 | 186,015 | 100,829 | 85 | 31 |
30/06/2021 | 186,015 | 0 | 0 | 0 |
30/06/2022 | 186,015 | 0 | 0 | 0 |
30/06/2023 | 186,015 | 0 | 0 | 0 |
30/06/2024 | 186,015 | 0 | 0 | 0 |
The following climate variables are available at the monthly timestep:
- Temperature
- Precipitation
- Sunshine Hours
- Mean Sea Level Pressure
- Water Vapour (as RH)
The 85 original data sources in the monthly station inventory currently have the following data usage policy:
- Monthly data from 8 original sources have open data access policy.
- Monthly data from 1 original source has a Creative Commons Attribution licence data policy.
- Monthly data from 28 original sources have stated or implied WMO Resolution 40 data policy.
- Monthly data from 18 original sources have a mixed data policy due to having stations located in many different countries across the world each with their own National data policies.
Again, declared national data policies that can be verified can be used to override source IPR restrictions in specific cases.
Daily data sources shall be averaged to monthly and used in preference as the monthlies are aggregated in a consistent manner in so doing whereas sources at monthly resolution may have had that value calculated several distinct ways which can have both random and systematic effects on the resulting long-term series.
Figure 11: Map shows locations of the daily stations inventoried as at 30/06/2024. Plots show the log number of stations (prior to identification and reconciling of duplicates) operational by year for four main climate variables 1750- 2024.
4.6. Data Intellectual Property Rights
The data intellectual property rights (IPR) tables have been compiled for all 371 data sources inventoried to date, and the data policy verified where possible. These IPR traces shall be made available online in the future. However, in some cases the information on IPR of a source is minimal and the data access policy could not be verified. In cases where original source data policy could not be verified the data can be assumed to be (at a minimum) available to users under a non-commercial basis consistent with the now superseded WMO Resolution 40, as to date all sources have either been obtained from a public facing repository or directly from the source with explicit agreement. It is possible that a source may have been made visible publicly without the permission of the rights holder as, in some cases, it is difficult to trace the original data policy. Historical bilateral agreements with data repository holders and data projects may exist that are not publicly visible and may have been misplaced. Users who have additional information on source IPR may contact us (see Section 10).
For sources where provenance is not presently satisfactorily attained, further investigations will be conducted to ascertain if bilateral agreements have been brokered and attempts will be made to obtain documentation. Where inadequate information from original data sources were available in the metadata, we have sent information requests and in many cases are awaiting a response.
Concurrently, there is an increasing movement toward open national data policies. The Service has made significant efforts to ascertain national level policies from national meteorological and hydrological services. In many cases these can be used to open up access without usage restrictions on a national basis. The stated national policy in such cases, which is generally more recent, is assumed to supersede any source level data policy. The national IPR policies as presently ascertained are outlined in the service data policy documentation.
We are providing an IPR table which is in an ISO19115 standard compliant format. The ISO19115 metadata schema provides fields that fully describe all relevant aspects of IPR and additional aspects of discovery metadata at several levels. The ISO19115 standard allows users to operate seamlessly and is consistent with INSPIRE and emerging WIGOS metadata standards. Table E in the Appendix presents the ISO19115 metadata standard format with an example of a data source IPR information entered in the relevant fields. It must be noted that some fields will remain empty in almost all cases, as this information in most cases is not available. However, this table provides the option to enter and edit all necessary metadata if the information is available either now or in the future.
Data policy is further articulated in the service data policy documentation available via the CDS catalogue entry. It is important that users respect data source usage citation and acknowledgement requirements as these citations and acknowledgements are often tied to continued programmatic support for observations, data rescue activities, database management etc. Data usage acknowledgments and citations relevant to each unique data download will always be provided as a stand-alone text file. Users must abide by these conditions.
5. ECVs selected for inclusion in the service and justification
The C3S2 311 Lot1 (Global Land and Marine Observations Database) service aim is to produce a comprehensive global set of data holdings, of known provenance, that is truly integrated both across Essential Climate Variables (ECVs) (Bojinski et al., 2014) and across timescales as envisaged in Thorne et al., (2017). Therefore, the main consideration when selecting the variables for inclusion was to try and ensure that the selected ECVs are consistently present across timescales and frequently reported. In all data releases, to ensure this consistency across timescales, the daily observations will be averaged up to the monthly scale and, eventually when resources permit, supplemented/backfilled/infilled with any unique monthly data. Therefore, the ECV selection was primarily based on the volume of station data per ECV available at the daily timescale (Table 5). The selected daily timescale ECVs were then cross checked with the availability at the sub-daily timescale.
Table 5: Information taken from the inventory as at 2018 summarizing ECV availability for ECVs considered for initial inclusion. Note that at this stage not all of the USAF data was inventoried and is an extracted sub-set as described in Section 3.3. Counts are total counts across all sources and thus represent an over-estimate of the true number of unique stations, the degree of which will only become apparent following complete reconciliation of the underlying sources.
Essential Climate Variable | Number of sub-daily stations | Number of | Number of monthly stations |
Precipitation | 23,103 | 146,251 | 19,210 |
Air Temperature | 23,133 | 80,158 | 81,407 |
Snowfall | 22,812 | 67,945 | - |
Snow depth | 22,812 | 52,587 | 4,050 |
Wind speed | 23,185 | 13,637 | 14,912 |
Wind direction | 23,185 | 12,121 | 9,397 |
Dew Point Temperature | 23,302 | 10,942 | - |
Sea Level Pressure | 23,503 | 8,364 | 17,238 |
Relative Humidity | 182 | 2,419 | 4,099 |
Sunshine | 364 | 2,373 | 10,922 |
Cloud cover | 295 | 1,091 | 5,392 |
Cloud ceiling | 355 | 46 | - |
Snow cover | Unknown | - | - |
Soil Temperature | - | - | - |
Grass temperature | - | - | - |
Evaporation | - | - | - |
ECVs have been sorted at each timescale and the number of stations counted for each ECV available. Here, availability is based upon presence anywhere within the station record and doesn't infer continuous availability for any given ECV. There are several other variables recorded in several sources of sub-daily files, such as numerous precipitation types, visibility, wind gusts, wind conditions, hail size and isobaric surface. But the extent of reporting of these variables is not clear at this point and these data are the subject of ongoing investigations.
The ECVs identified consistently across timescales were also cross checked with the initial user survey ECV priorities to try and ensure that identified data user needs are met to the extent possible and practicable (C3S _D311a_Lot2.3.6.1_Initial_User_Survey _v1). Results from the user survey showed that ECVs such as temperature, precipitation, surface pressure, humidity, wind measurements and soil moisture are most important to those data users surveyed, and users consistently felt that more focus is needed to make them more available at a global level. Many of the interviewees also noted a current lack of cloud measurements and stated that these are an essential variable for climate change assessment and require some focus. Ideally, users would like to have these ECVs available at the sub-daily timescale or, if not, then at the daily timescale.
Selection of ECVs also requires a consideration of the maturity of knowledge in how to undertake the subsequent harmonization activities. Here, harmonization includes:
- Reconciling station records from different sources
- Reconciling station records across the different time slices of sub-daily, daily and monthly
- Merging records
- Quality assurance
- Quality control
All the above-mentioned steps require a degree of knowledge of the ECV and how, generally, it has been measured historically. Within a finite duration and resource contract, such as C3S2 311 Lot 1, a necessary balance needs to be struck between ambition and quality. Picking variables for which there is imperfect knowledge or for which there are known to be substantive potential issues increases the risks of sinking resources into areas that prove unproductive with little return for the C3S service as a whole. It is better to pick a restricted set of ECVs and do it well than to spread effort too thinly. Finding that 'Goldilocks point' is an ongoing challenge.
Table 5 shows that at the daily timescale there are eight ECVs with greater than 5,000 stations with data:
- Precipitation
- Air temperature
- Snowfall
- Snow depth
- Wind speed
- Wind direction
- Dew point temperature/relative humidity
- Sea Level Pressure
Based upon their prevalence, historical and present methods of observation (Section 2), stated user needs, and knowledge of data quality, these eight ECVs have been selected for inclusion in the current release and as a minimum set for subsequent releases. Note that owing to remaining data issues or planned work not yet completed not all these variables are present for all timesteps in the present release.
Table 5 illustrates that there are some omitted ECVs that have a significant number of stations with data. For sunshine, data is only prevalent at the monthly scale, relating to its historical inclusion in CLIMAT messages. Transmission has generally not occurred at daily and sub-daily scales and this is reflected in the data counts (Table 5). This means it does not meet our criteria of timescale consistency for inclusion in the current data release. In addition, the user survey results did not indicate that sunshine measurements are among the most important variables for those users surveyed.
Table 5 also shows that there are some cloud data inventoried, but the station volume is generally low. At the daily timescale there are only 1091 stations with cloud cover and 46 with cloud ceiling. At the sub-daily timescale there are 295 stations with cloud cover and 355 stations with cloud ceiling (although this precludes USAF where disentangling the various cloud observations is an ongoing activity). There are 5,392 stations at the monthly timescale with cloud cover data. Cloud cover is an important ECV and has been identified by data users as being a variable that requires more focus to increase coverage. However, it is known that there are heterogeneities in cloud cover observations across national jurisdictions and through time (Section 2). In particular, modern observations tend to be automated using ceilometers which lack full-sky view and cannot detect high clouds. Due to a combination of these issues and the lack of daily data volumes we will be omitting cloud variables from releases until these issues are satisfactorily dealt with.
We have identified some availability of snow cover, hail size, visibility, soil temperature, grass temperature and evaporation data at some timescales. However, we will be omitting these ECVs from the current data release due to the low volumes of data available and lack of consistency across timescales. It should be noted that we will be revisiting all the omitted variables regularly over the service contract to reassess their availability and pending sufficient scientific and user demand driven justification an ECV will be added into the data releases. Users who wish to see additional ECVs added should arise that interest (see Section 10).
6. Common Data Model (CDM)
The harmonised data holdings need to be presented in a usable and consistent manner. It is the desire of C3S that this delivery be consistent across the suite of in-situ data activities to the extent possible and practicable. To that end C3S 311a Lot 2 have led the development of a common data model (distinct from a data format) that allows a consistent representation of the data and associated metadata. This Common Data Model is described in the deliverable {C3S_D311a_Lot2.2.1.1_201708_Initial_specification_for_CDM_v1} and regularly updated at https://dast.copernicus-climate.eu/documents/in-situ/cdm_latest.pdf} and a copy of this is made available via the CDS catalogue entry. Figure 12 shows a schematic overview of the Common Data Model. Similar to the ECMWF ODB data model, the reports are partitioned into a header record and individual observation records. Additional metadata are available through linked tables.
The records in the header table (Table 6) contain the date / time, location and station identifier for each land report. These may be reports containing instantaneous (or synoptic) observations or reports containing longer duration statistics such as daily or monthly summaries.
The records from the observations table contain the actual observed data or summary statistics. The fields / elements contained in the observation records are identical for all ECVs and observation types/statistics, with the observed variable and statistic type indicated by flags. These are the "observed_variable" and "observation_significance" respectively. The date / time and duration of the observations are also included in the observation records, recognizing that these may be different to the time the land report was recorded. Similarly, the observation records include the spatial location of the observations. Table 7 lists the elements of the observation records.
Data in the header_table and observations_table are linked through the "report_id" elements.
Figure 12: Schematic overview of the Common Data Model.
Table 6: Name, data type and description of the different elements in the header_table from the CDM. External references have been excluded but are listed in the CDM documentation2.
Element name | Kind | Description |
station_heading | Numeric | Station heading if mobile |
height_of_station_above_local_ground | Numeric | Height of station above local ground (m) |
height_of_station_above_sea_level | Numeric | Height of station above mean sea level (m), |
height_of_station_above_sea_level_accuracy | Numeric | Accuracy to which height of station known (m) |
sea_level_datum | Integer | Datum used for sea level |
report_meaning_of_time_stamp | Integer | Report time - beginning, middle or end of |
report_timestamp | Timestamp with | e.g. 1991-01-01 12:00:0.0+0 |
report_duration | Integer | Duration/period over which observation was made (s) 0 = instantaneous (less than 2 seconds) 1 = 2 seconds 2 = 5 seconds 3 = 10 seconds 4= 30 seconds 5 = I minute 6 =2 minute 7 = 5 minute 8 = 10 minute 9 = 1 hour 10 = 3 hour 11 = 6 hour 12 = 9 hour 12 = 12 hour 13 = 1 day 14 = monthly 15 = mixed frequency |
report_time_accuracy | Numeric | Precision to which time was recorded (s) |
report_time_quality | Integer | Quality flag for report_timestamp |
report_time_reference | Integer | Reference Time (e.g. referenced to time server, atomic clock, radio clock etc) |
profile_id | Character | Information on profile (atmospheric / oceanographic) configuration. Set to Record ID for profile data or missing (NULL) otherwise. |
events_at_station | Integer array | e.g. ship hove to, crop burning etc. |
report_quality | Integer | Overall quality of report |
duplicate_status | Integer | E.g. no duplicates, best duplicate, duplicate, not checked. |
Duplicates | Integer array | Array of report_id's for duplicates |
record_timestamp | Timestamp with time zone | Timestamp of revision for this record |
History | Character | Sequence of processing steps. Free text with timestamp 1 : history 1; timestamp 2 : history 2 etc. |
processing_level | Integer | Level of processing applied to this report |
processing_codes | Integer array | Processing applied to this report |
Table 7. As Table 6 but for the observations_table.
Element name | Kind | Description |
observation_id | Character | unique ID for observation |
report_id | Character | Link to header information |
data_policy_licence | Integer | data usage policy |
date_time | Timestamp with time zone | timestamp for observation |
observation_duration | Integer | Duration/period over which observation was made (s) 0 = instantaneous (less than 2 seconds) 1 = 2 seconds 2 = 5 seconds 3 = 10 seconds 4= 30 seconds 5 = I minute 6 =2 minute 7 = 5 minute 8 = 10 minute 9 = 1 hour 10 = 3 hour 11 = 6 hour 12 = 9 hour 12 = 12 hour 13 = 1 day 14 = monthly 15 = mixed frequency |
Longitude | Numeric | Longitude of the observed value, -180 to 180 (or other as defined by CRS). This may or may not be the same as the report location. |
Latitude | Numeric | Latitude of the observed value, -90 to 90 (or other as defined by CRS) |
Crs | Integer | Coordinate reference scheme use to encode Location |
z_coordinate | Numeric | z coordinate of observation |
z_coordinate_type | Integer | Type of z coordinate |
observation_height_above_station_surface | Numeric | Height of sensor above local ground or sea |
observed_variable | Integer | The variable being observed/measured |
secondary_variable | Integer | Secondary variable required to understand observation, e.g. chemical constituent. Set to NA/missing if not applicable. |
observation_value | Numeric | The observed value |
value_significance | Integer | e.g. min, max, mean, sum |
secondary_value | Integer | value for the secondary variable. Set to NA or missing if not applicable. |
Units | Integer | Units for the observed variable |
code_table | Integer | Encode/decode table for variable (if encoded) |
conversion_flag | Integer | Flag indicating whether original, converted or both values are available. |
location_method | Integer | Method of determining location |
location_precision | Numeric | Precision to which location is reported (radius km) |
z_coordinate_method | Integer | Method of determining z coordinate |
bbox_min_longitude | Numeric | Bounding box for observation, valid range given by CRS |
bbox_max_longitude | Numeric | Bounding box for observation, valid range given by CRS |
bbox_min_latitude | Numeric | Bounding box for observation, valid range given by CRS |
bbox_max_latitude | Numeric | Bounding box for observation, valid range given by CRS |
spatial_representativeness | Integer | Spatial representativeness of observation |
quality_flag | Integer | Quality flag for observation |
numerical_precision | Integer | Reporting precision of observation in units given by 'units' variable. Equivalent to BUFR scale factor |
sensor_id | Character | Link to sensor_configuration table. |
sensor_automation_status | Integer | Automated, manual, mixed or visual observation |
exposure_of_sensor | Integer | Whether the exposure of the instrument will impact on the quality of the measurement |
original_precision | Integer | Original reporting precision in units given by 'original_units' |
original_units | Integer | Original units |
original_code_table | Integer | Encode / decode table for variable (if encoded) |
original_value | Numeric | Original value as reported or recorded in log book. |
conversion_method | Integer | Link to table describing conversion process |
processing_code | Integer array | e.g. TRC (temperature radiation corrections) etc. Encoded in table. |
processing_level | Integer | Level of processing applied to observation. |
adjustment_id | Character | Total adjustment applied to observation reported in observation value (observation_value = original+ adjustment) |
Traceability | Integer | Whether observation can be traced to international standards. |
advanced_qc | Integer | Flag indicating whether advanced qc data are available |
advanced_uncertainty | Integer | Flag indicating whether advanced uncertainty estimates are available |
advanced_homogenisation | Integer | Flag indicating whether advanced homogenisation information is available |
source_id | Character | Original source of data, link to external table |
The CDM is extendable and a governance model is in place. Users who wish to see additional functionality are encouraged first to read the documentation and then arise suggestions (see Section 10). A subset of the CDM is currently realised for end users as a data service and is described in Section 9 following a description of the harmonisation approach and the data release in the intervening sections.
7. Merging methodology
The current release consists of GHCN-D, monthly data derived therefrom, and a combination of a number of sub-daily sources starting from the USAF subset. To the extent that GHCN-D and USAF are themselves amalgamations of multiple sources there is some degree of merging performed upstream of the C3S2 311 Lot 1 activities. The reader is referred to the GHCN-D readme.txt documentation for the information on sources which is available at the following link (https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/) or see (Menne et al. 2012).
For GHCN-D the data consists of a merge of stations from 30 underlying data sources which are shown in Table 8. The merge approach consists of creating a list on a per station basis of contributing stations from the underlying sources on a prioritised basis. The daily data sources are first ordered into a priority based upon quasi-objective criteria around data characteristics (completeness, length, provenance etc.). Next each station in each source is cross-checked with all other sources to identify station matches. This procedure is semi-automated with human intervention to verify matches. Once a definitive mingle list has been ascertained for each station, the merge proceeds by reading in first the lowest priority source followed by progressively higher priority sources until all sources have been added. New sources overwrite existing data such that the eventual version of each station record contains by preference higher priority data sources. Data in any given month must arise from a single source for a given ECV. For full details on the GHCN-D merge process see Menne et al., (2012) or further information on the GHCN-D dataset can be found at: https://www.ncdc.noaa.gov/ghcn-daily-description.
It has been agreed that for daily data C3S2 311 Lot 1 shall continue to deploy the GHCN-D merge methodology. Activity within C3S2 311 Lot 1 for the daily timescale will be around preparation and merging of additional sources into GHCN-D and expansion to inclusion of those ECVs identified in Section 4 not yet included.
The Global Summary of the Month (GSOM) used to create the current monthly release holdings is an extraction of monthly average data from GHCN-D and so no further merge is undertaken for the monthly data. The monthly averaging follows prescribed procedures regarding completeness and tolerance for consecutive missing days. Only those data not quality control flagged at the daily level are used in the construction of the monthly values (see Section 9.1.3 for more details on the GSOM averaging). Later in development, additional monthly only data sources which tend to extend further back in time in many regions of the world shall, as resources permit, be merged into this product. Details of the procedure are to be developed, although it is likely to broadly follow the overall GHCN-D merge approach.
Table 8: shows the 30 data sources used to produce the GHCN-D dataset.
U.S. Cooperative Summary of the Day (NCDC DSI-3200) |
CDMP Cooperative Summary of the Day (NCDC DSI-3206) |
U.S. Cooperative Summary of the Day -- Transmitted via WxCoder3 (NCDC DSI-3207) |
U.S. Automated Surface Observing System (ASOS) real-time data (since January 1, 2006) |
Australian data from the Australian Bureau of Meteorology |
U.S. ASOS data for October 2000-December 2005 (NCDC DSI-3211) |
Belarus update |
Environment Canada |
Short time delay US National Weather Service CF6 daily summaries provided by the High Plains Regional Climate Center |
European Climate Assessment and Dataset ECA&D (Klein Tank et al., 2002) |
U.S. Fort data |
Official Global Climate Observing System (GCOS) or other government-supplied data |
High Plains Regional Climate Center real-time data |
International collection (non-U.S. data received through personal contacts) |
U.S. Cooperative Summary of the Day data digitized from paper observer forms (from 2011 to present) |
Monthly METAR Extract (additional ASOS data) |
Community Collaborative Rain, Hail,and Snow (CoCoRaHS) |
Data from several African countries that had been "quarantined", that is, withheld from public release until permission was granted from the respective meteorological services |
NCEI Reference Network Database (Climate Reference Network and Regional Climate Reference Network) |
All-Russian Research Institute of Hydrometeorological Information-World Data Center |
Global Summary of the Day (NCDC DSI-9618) NOTE: "S" values are derived from hourly synoptic reports exchanged on the Global Telecommunications System (GTS). Daily values derived in this fashion may differ significantly from "true" daily data, particularly for precipitation (i.e., use with caution). |
China Meteorological Administration/National Meteorological Information Center/ Climatic Data Center (http://cdc.cma.gov.cn) |
SNOwpack TELemtry (SNOTEL) data obtained from the U.S. Department of Agriculture's Natural Resources Conservation Service |
Remote Automatic Weather Station (RAWS) data obtained from the Western Regional Climate Center |
Ukraine update |
WBAN/ASOS Summary of the Day from NCDC's Integrated Surface Data (ISD). |
U.S. First-Order Summary of the Day (NCDC DSI-3210) |
Datzilla official additions or replacements |
Uzbekistan update |
Conagua Mexican Water Commission |
Details of Sub-daily data merge.
For the sub-daily data, the USAF reissue is more opaque, being based upon grey literature shared with NOAA NCEI. The USAF source is an amalgamation of various early NCEI sources augmented by data receipts via the Washington node of the GTS. Unfortunately, how these sources have been merged is not entirely clear. The closest aspect to a source level that is available, is to use the platform type which relates to the mode of transmission and identifies broad network types; but for any given platform type, stations may arise from multiple contributory data sources. For the present release, ICAO airport reports, WMO station identifier reports, and air force assigned identifiers which relate to now closed WMO stations have been extracted. Most remaining data are from various mesonets (networks of typically automated weather and environmental monitoring stations designed to observe mesoscale meteorological phenomena) and arise almost entirely since the late 1990s. USAF will thus be afforded the lowest priority in the merge. The merge approach largely mirrors that applied to GHCN-D.
For the current data release the USAF reissued holdings were merged with other sub-daily sources of better provenance. These other sub-daily data that are included in the current data merge come from the following sources:
- The International Surface Pressure Databank (ISPD) which contains 70 underlying sources and consist almost exclusively of sea level pressure and station level pressure data.
- The "TD-13" dataset which came from the NCAR data archive and consists of 10,851 global stations with temperature, precipitation, humidity, wind and pressure observations
- 500 stations from the NOAA CDMP located across the US with observations of wind, snow, temperature, water vapour and pressure from 1892-1997.
- 14 underlying data sources to the UERRA regional reanalysis for Europe.
- Data from the UK Met Office Daily Weather Reports, data for Europe.
- Data from the University of Giessen containing data for India.
- University of Bern, (CHIMES project) data for Switzerland.
- The Central Institution for Meteorology and Geodynamics (ZAMG) data for Austria.
- University of Witwatersrand, data for South Africa.
- Chile Met Service, data for Chile.
- Data from the University of Giessen containing data for Australia.
- National Climate Centre (CMA ISPD) data for China.
- Climate Science for Service Partnership China (CCSP) data for India and Sri Lanka.
- Data from ACRE African stations late 19th Century.
- Meteo-Lux data for Luxembourg.
- Data from ACRE for the Solomon Islands.
- Data from Environment & Climate Change Canada (ECCC).
- Data from (NCAR/RDA) for Greenland and Iceland.
- Data from (NCAR/RDA) Brazilian Air Ministry for Brazil.
- Data from (NCAR/RDA) Australia Summary of Day and Surface Observations for Australia.
- Data from (NCAR/RDA) for Mexico.
- Data from (NCAR/RDA) DSI-3280 for United States and others.
- NOAA/NCEI - The Coastal-Marine Automated Network (CMAN) USAF data.
- INMET Brazilian met service
- Israel Met Services
- Met Eireann (Irish Met Service)
- Deutscher Wetterdienst (DWD) (German Met Service)
- UK data rescue (Ed Hawkins, University of Reading)
- Icelandic Met Office :CARRA-Iceland_project_data
- Met NO (Norway Met Service)
- SMHI Swedish Met Service
- Belgium Met Service
- Polish Met Service
- Meteo France
- AEMET Spanish met service
It is envisaged that the merge in subsequent releases will also iteratively attempt to unify the station identifier schema used across timescales to allow users to seamlessly navigate between monthly, daily and sub-daily resolution records. This has not been attempted to date due to a combination of: i) the technical challenges to assuring correct matches which requires development and deployment of an entirely new suite of processing capabilities; and ii) required progress in ID-schema governance. The C3S2 311 Lot 1 team have been working with colleagues in WMO to explore whether the WIGOS station identifier schema can be used to assure international interoperability. This work remains in progress at the current time with the full proposal to be considered by the WMO Infrastructure Commission in late 2022. For the current release, no attempt has been made to merge station identifiers across from sub-daily to daily and monthly (the latter by construction at this juncture being consistent as outlined earlier in this section).
8. Quality Assurance / Quality Control
For the current release, we are using the quality control (QC) and quality assurance data associated with the GHCN-D and GSOM sources without modification. Further details for each source are available via the journal papers and grey literature describing the processing as discussed below. However, for the current data release we are applying our own initial QC checks to the sub-daily data, and details are presented in this section.
In subsequent work there may be a requirement for developing and deploying new or additional QC/QA which may serve to augment and / or replace the initial quality control summarised herein. It is of critical importance that all users are aware that the database is constructed in such a manner that data is flagged but is never removed. Quality control decisions contain irreducible ambiguity and it is important that original data be retained to allow subsequent revisiting of data flagging decisions when new insights or improved techniques accrue.
Daily data QC
The GHCN-D quality control procedure is documented extensively in Menne et al., (2012) with any updates noted on the GHCN-D website (https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/global-historical-climatology-network-ghcn). The quality control (QC) consists of a mix of record exceedance checks, climatological checks, distributional checks and neighbour-based checks (see Table 9). The GHCN-D quality control flags provide information on the tests failed by an observation. We have also enacted a QC pivot table which is populated with added information about each QC check method conducted for each failed QC observation. However, for the present release this is not presently available to users and solely information on whether the tests are all passed or at least one test has failed is provided. We are simply flagging an observation by converting the GHCN-D QC source flags to either passed QC flag (0) or failed QC flag (1). This information should be sufficient to gain user feedback on adequacy of the current release as a whole. Most users will probably never make use of more than this simple flag approach encoded in the observations table. Expert users should, however, be aware that considerable improvements to the handling of the QC flags could accrue in subsequent releases allowing these users to fully explore what tests failed.
Monthly data QC
For GSOM the quality control flags of GHCN-D are used to exclude suspect daily values from being used to calculate a monthly average. Some QC flags are carried into the GSOM and we are also simply flagging an observation by converting the GHCN-D QC source flags in the GSOM to either passed QC flag (0), failed QC flag (1), missing (3) or observed value updated or changed (4). As with the daily data a QC pivot table populated with added information about each QC check method conducted for each failed QC observation will be available to users in future releases. Section 8.1.3 provides more details of the criteria for calculating the GSOM from the GHCN-D for each variable. These criteria are consistent with WMO guidance. At this juncture no further quality control is applied to the monthly data.
Table 9: List of GHCN-Daily Dataset Quality Checks and flags relevant to the GSOM data
Source QC Flag | Description of Quality Control Check |
D | Duplicate check |
G | Gap check |
I | Internal consistency check |
K | Streak/frequent-value check |
L | Check on length of multiday period |
M | Mega consistency check |
N | Naught check |
O | Climatological outlier check |
R | Lagged range check |
S | Spatial consistency check |
T | Temporal consistency check |
W | Temperature too warm for snow |
X | Failed Bounds check |
Z | Flagged as a result of an official datzilla investigation |
Sub-daily data QC
Sub-daily data have not undergone any consistent QC upstream of our process. Some QC flags are now available from the upstream data sources, but have not as yet been included in the final outputs files. For users wishing to access these data, please use the GHCNh data files available from https://www.ncei.noaa.gov/products/global-historical-climatology-network-hourly. The current set of sub-daily QC tests work on individual stations as well as neighbour (buddy) checks. As yet there are no checks working across time periods (ensuring that monthly, daily and sub-daily data are consistent with each other). These QC tests have been based on the suite of checks from the HadISD dataset (Dunn, 2019, Dunn et al, 2016, Dunn et al, 2012), which themselves were inspired by some of the checks applied to the GHCND dataset (Durre et al, 2010). For further details and examples of these tests, please refer to those publications. Herein we give outlines of the tests in this implementation, and the codes are also made available via GitHub (https://github.com/glamod/glamod_landQC) which contains all the settings. In the current implementation, each test works independently, with no account taken of flags set by tests earlier in the sequence. Hence a single observation can be flagged by many tests, but also thresholds derived from the data themselves will include potentially bad data.
Where thresholds are set from the characteristics of the data themselves, these are stored in separate files. These allow the QC tests to be run with stable thresholds where data have been appended since the last run. If the thresholds were recalculated every time, then it would be possible that additional data could change the thresholds, either unflagging previously flagged observations, or flagging ones that previously would not have been flagged. Currently this option is not used given the nature of the releases to date. It would be used in any future near-real-time updating capacity which is presently under development.
Logic Checks
These tests check both the station metadata and also reasonable limits for the observed values.
The station itself is flagged if:
- Latitude == longitude == 0
- Latitude > 90, latitude < -90
- Longitude > 180, longitude < -180
- Elevation < -432.65, elevation > 8850 (missing elevation encoded as e.g. -999 or –9999 is allowed)
- Timeseries starting before 1700
- Timeseries ending after the present (date of QC run)
- Timestamps not in ascending order (e.g. from a repeated block)
Each meteorological variable is checked to ensure that it also falls within reasonable limits. These should be redundant as following checks (e.g. world records) would capture these, however the intention is to move these further upstream in the processing chain to when the ingested data are converted to our internal format. Retaining these here should identify systematic issues in upstream processes. The test checks that at least 99.5% of the observed values fall within the bounds listed below. If this is not the case (i.e. more than 0.5% fall outside of the bounds), then all times where observations fall outside of these bounds are flagged.
- -75 <= Temperature <= 75 C
- -75 <= dew point temperature <= 75 C
- 300 <= station level pressure <= 1200 hPa
- 870 <= sea level pressure <= 1090 hPa
- 0 <= wind speed <= 50 m/s
- 0 <= wind direction <= 360 degrees
Odd Cluster [temperature, dew point, station level pressure, sea level pressure, wind speed]
This test looks for temporally isolated observations. Without nearby observations to compare to, it is difficult to determine whether these observations are good. Small clusters of up to 6 observations in a period of up to 12 hours separated from other observations by 28 days or more are flagged. This test will also remove observations where the year has been mis-transmitted – e.g. transmitting 1973 as 1963 which appears to be a prevalent issue in many USAF stations whereby in (at least) the decade prior to station commencement (and the decade after for closed stations) ghosted observations exist.
Frequent Values [temperature, dew point, station level pressure, sea level pressure]
This test identifies values which occur more frequently than would be expected. The variables that this test runs on have a roughly Gaussian distribution. The expectation of a smoothly varying distribution is used to identify bins which are much larger than expected. For data from all years for each calendar month (e.g. data from all Januariys), a distribution is calculated using a bin width corresponding to the reporting resolution of the observations (typically 1.0 or 0.5 K/hPa/ms-1). Using a rolling set of 7 bins, the central one is tested to see if it is the locally largest, contains more than 50% of the data and also more than the data count threshold (120 observations, uniform across all tests). If these requirements are met, this bin is noted. Then, for each year within these calendar months, the distribution is assessed again, with the identified bins re-checked on an annual basis. If they still meet the requirements all observations within this bin are flagged. We note that this will flag some good observations, but does ensure the flagging of many spurious values, especially in the tails.
Diurnal Cycle
This test is run on temperature only as this variable has a robust diurnal cycle for stations not in polar regions. Hence it is only run for stations with latitudes below 60°N/S.
Firstly, a diurnal cycle is calculated for each day with at least four observations spread across at least three quartiles of the day by fitting a sine curve with amplitude equal to half the range of the reported temperatures. The phase of the sine curve is determined to the nearest hour by minimising a cost function, namely the mean squared deviations of the observations from the curve. The climatologically expected phase for a given calendar month is that with which the largest number of individual days phases agrees. If a day’s temperature range is less than 5K, no attempt is made to determine the diurnal cycle for that day.
It is then assessed whether a given day’s fitted phase matches the expected phase within an uncertainty estimate. This uncertainty estimate is the larger of the number of hours by which the day’s phase must be advanced or retarded for the cost function to cross into the middle tercile of its distribution over all 24 possible phase-hours for that day. The uncertainty is assigned as symmetric. Any periods>30 days where the diurnal cycle deviates from the expected phase by more than this uncertainty, without three consecutive good or missing days or six consecutive days consisting of a mix of only good or missing values, are deemed dubious and the temperature elements are flagged.
Distribution [temperature, dew point, station level pressure, sea level pressure]
This test performs two checks. One uses the distribution of monthly anomalies to look for asymmetries in this distribution. The second uses all observations, converted to hourly anomalies and assesses if there are secondary populations separated from the main distribution.
For each calendar month, monthly average values are calculated, and then standardised using their average and spread. Months which have a large offset from zero (>5) are flagged. Then the months are sorted and, proceeding outwards from the median, the pair of months either side are compared. Flagging is triggered if the anomaly in one month is at least twice as distant from the median as its pair. This and all months further from zero on the relevant tail of the distribution are flagged.
The second part of this test compares all the normalised anomalies for a given calendar month over all years and looks for outliers or secondary distributions. A Gaussian (allowing for non-zero skew) is fitted to the distribution of the normalised anomalies, and threshold values are set at the next empty bin outward from zero after this fitted curve passes y=0.13. Any values further from zero than these thresholds are flagged (Figure 13). As the signals from intense storms are likely to be flagged in the pressure data by this test, the wind speed is used to cross check, looking for high values. If the wind speed anomaly is more than 5 times the spread at the same time as a pressure flag has been set, then this could be a storm and should be kept. The flags are removed for these periods.
World Records [temperature, dew point, sea level pressure, wind speed]
We check that each observation falls within the bounds for the global records as held by the WMO at https://wmo.asu.edu/content/world-meteorological-organization-global-weather-climate-extremes-archive.
We use the station ID country codes to perform a look-up as to which continent the station’s country sits in. In cases (e.g mid-oceanic islands) where this is does not resolve to one of the standard continents, we use the global values. Using the continent assignment, we then take the values available at the Arizona State University archive to do the record check. The global values which are used as a default in case no region assignment was possible
- -89.2 to 56.7 C temperature
- -100 to 56.7 C dewpoint
- 0 to 113.2 m/s wind speed
- 870 to 1083.3 hPa sea level pressure
In future data releases, we hope to use regional (or national, where available) records for this check.
We note that some recent regional records have not yet been fully validated by the WMO, e.g. the 2021 European maximum temperature record at Syracuse of 48.8C, and as such the record values do not yet use this as the regional maximum. However, as no data values are removed by these QC checks, only flagged, users can ignore flag in cases where they have other information to corroborate values.
Figure 13: Station CHA00515670 for Temperature (K) in December and Station level pressure (hPa) in February. The observations (black) fitted curve (blue) and threshold values (red) show that both individual values and whole clusters can be flagged by the distribution test. Note logarithmic y-axis.
Streaks [temperature, dew point, station level pressure, sea level pressure, wind speed, wind direction]
This set of tests look for various forms of repeated data. The first test looks for streaks of single values which occur for longer than expected. Using the first-differences between neighbouring observations, the distribution of repeated values is calculated (number of times 2, 3, 4 etc. identical observations in a row occurs). This distribution is fitted with a 1/x curve. To determine the threshold where longer streaks are flagged, we use the first empty bin after the point where this fitted curve falls below 0.1. Any streaks which are longer than this threshold are flagged (Figure 14). Calm periods are excluded from this test.
The second looks for excess numbers of shorter streaks of repeated values. In some stations, there are many shorter streaks of data that themselves are not long enough to trigger the first test, but this excess is potentially spurious. Using a similar approach as above, the test flags years where there is a greater than expected fraction of data in streaks of 10 or more consecutive repeating values. The distribution of the fitted using a 1/x curve, to determine what proportion is unusual, and so those with this and above are flagged. Calm periods are excluded from this test.
Finally, a test looks for streaks where a whole day’s values are repeated. Again, all streaks and their lengths are collated, and a critical value found using a 1/x fit. Those streaks longer than this critical value are set.
Figure 14: Station USA00700636 for temperature, showing the distribution of the repeating string lengths (note logarithmic y-axis), the 1/x fit (blue) and the threshold set (red). In this case, one string of 25 repeating values has been flagged.
Climatological [temperature, dew point]
This test identifies individual values which fall beyond the climatologically expected distribution. Monthly climatologies are calculated for each hour of the day using winsorised data (removing the effects of outliers) when there are data for at least 5 unique years. The raw observations are then standardised using these climatologies and the spread of the observations for that month and hour. To protect low variance stations, the minimum value for the spread is 1.5. The final values are then low-pass filtered to remove any long-timescale signal from anthropogenic climate change that could affect the removals at the beginning and end of the timeseries. The distribution of these standardised anomalies is fitted with a Gaussian (allowing for a non-zero skew) and threshold values set at the next empty bin outward from zero after the fitted line crosses y=0.1. Any anomalies further from zero than these thresholds are flagged. As this test would frequently flag storm signals in pressure data, it is not applied to either pressure measure.
Timestamp [temperature, dew point, station level pressure, sea level pressure, wind speed]
This test identifies locations where there are two entries for a single timestamp in the files and the values for the observations are not identical and the test flags both entries.
Precision check [(temperature, dewpoint)]
This test identifies timestamps where the reporting precision of paired variables (e.g. temperature and dew point temperature) are not equal. This test was added to highlight potential issues when there is apparent supersaturation. A lower precision dew point appears to be at a higher temperature than a higher precision dry-bulb temperature measurement because of rounding (for example, T=10.8°C but Td=11°C to single degree [but could be in the range 10.5-10.8°C and so not supersaturated]). These flags would help users by being a qualifier for e.g. the supersaturation flag. Currently only temperature and dew point are compared, but other paired variables will be considered in the future.
Spike [temperature, dew point, station level pressure, sea level pressure, wind speed]
This test looks for short term departures from a smoothly varying timeseries. The first differences between observation values for the appropriate temporal difference are used for spikes of length one, two or three observations. As in the streak check, the distribution of first differences is used to determine above what level the jumps between values are unreasonable. The threshold is set using the same fitting procedure (1/x curve) and determination of the threshold (next empty bin after the curve drops below 0.1). Any values which have a first difference larger than this threshold on the entry and exit of a spike are flagged.
Humidity [dewpoint temperature]
These tests check for super saturation (dewpoint temperature greater than temperature) and long streaks of zero dew point depression (dewpoint temperature equal to the temperature). The former flags any location where this occurs. The latter looks at the streaks where these two parameters are equal. Using the same approach as the streak check, unreasonably long periods of repeated instances are flagged as some climates may experience this situation relatively regularly.
Variance [temperature, dewpoint temperature, station level pressure, sea level pressure, wind speed]
This test identifies months where the within month variance of normalised anomalies is sufficiently greater than the median for that calendar month over the period of record, using winsorised data (5%, Afifi & Azen, 1979). For each calendar month (120 obs per hour-of-day within each calendar month [e.g. all 12:00s for all Januarys]), we remove the diurnal cycle to make anomalies using an hourly average value. These anomalies are then normalised by their spread. Then for each year in this calendar month, the variance is calculated.
Months where the variance differs from the average by more than 8 times the spread of the monthly variances are flagged. Pressure and wind data are afforded extra checks to ensure that they are not erroneously removed due to tropical storms being present that month. Any month where the largest number of consecutive positive or negative changes in the values exceeds 10 points is not flagged given the progressive nature of the change. Also, months where high wind speeds and low-pressure values (difference from the average is greater than 4 times the spread) are concurrent are not flagged.
Pressure [sea level pressure, station level pressure]
This set of tests compares the station and sea level pressure data. Firstly, the differences between the two measures are calculated, and the average and spread of these determined. The spread is limited to fall within 1 and 5 hPa (the upper limit is required in case of bimodal distributions e.g. arising from an undetected change in elevation of the station). Then any difference which falls outside of 4 times the spread is flagged.
The second part uses the station pressure and elevation to calculate the theoretical sea level pressure expected to have been reported. If the station elevation is missing, then this test is not run. These are then compared to the values which are in the data file. If the difference is larger than 15hPa, these sea level pressure and station level pressure values are flagged. We note that we cannot be sure if there has been a failure in the conversion used, an error in the elevation or an error in the recorded pressure values.
Winds [wind speed, wind direction]
These tests check that a number of conventions when recording wind data are adhered to, following DeGaetano (1997).
- Observations where there is no wind direction, but a wind speed of zero (calm) are checked. Convention is for the wind direction to be given as 0° in these cases. When this convention is not adhered to, the direction is flagged4.
- Negative wind speeds are flagged.
- Negative wind directions are flagged.
- Wind directions > 360° are flagged.
- Non-zero wind directions where the wind speed is 0 (calm) are flagged.
- Northerly wind direction is by convention given as 360°, so directions of 0° with a non-zero wind speed are flagged.
In the future, a number of these tests may try to fix the recorded values to match the conventions, but so far this has not been implemented.
Neighbour (Buddy) Checks [temperature, dew point temperature, sea level pressure, station level pressure, wind speed]
The closest 20 neighbours which are within 500km distance and 200m elevation are identified. In regions of the world with sparse station coverage, there are instances where fewer than 20 neighbours are identified.
For each variable, the difference series between the target station and each neighbour station is constructed. Each calendar month is then processed in turn, and the spread of the differences calculated over all years for this target-neighbour pair (e.g. all Januaries in the record). Locations where the difference is greater than 5 times the spread are identified. A value is then flagged if at least 2/3 of the neighbours have identified the difference to be greater than 5 times the spread.
For the pressure variables, we account for the passage of deep low-pressure systems (e.g. tropical cyclones or extra tropical depressions) by counting the number of positive and negative differences when the neighbours are further than 100km away. This is to prevent flagging of extreme low-pressure observations in the core of these systems if they pass close to the target station. If the majority of differences (>2/3) are negative, then only the positive differences are flagged and the negatives (presumed to be storm signals) retained.
Clean Up [temperature, dew point temperature, station level pressure, sea level pressure, wind speed, wind direction]
This test identifies months where more than 60% of the observations have been flagged by other tests and flags the remainder. It is likely that there are still undetected issues in these remaining observations which could not be picked up because of the existing pervasive issues.
High flag [all]
This test identifies variables where > 20% of values have been flagged by the other tests. As it is likely there is a pervasive issue with this variable for this station, the remaining observations are flagged. If two or more variables trigger this test, then the entire station is withheld from further processing. For the sea level pressure/station level pressure and wind speed/wind direction pairs, flags set by this test in one are set in the other, but these do not count towards the withholding criterion.
We have provided all QC plots for the current data release in the following attachment.
These plots show the number of flags set by each test for each variable and also the proportion these comprise of the total number of sub-daily observations. We show both these as the length of record varies across the stations, and so a large number of flags set may indicate more or less of a pervasive problem.
For temperature, stations with the largest numbers of flagged observations (counts) are in the USA and Europe (Figure 15). These also correspond to the stations with the greatest number of observations, and so the relative flagging rate is not that exceptional. The tests which flagged the most were the streak check, the buddy check and for some stations the clean-up. However, there was no indication in most these summary plots that any of the tests were finding issues specific to geo-political regions (which could indicate a translation error) or that they were flagging too may observations. Stations over Germany do stand out for some tests (distribution, world records), and further investigations are needed to understand why this is the case, which may also be down to the large number of data-rich stations now included in this release.
For the dew point temperature, a similar picture emerges to that for temperature (including the cluster over Germany), but there are extra flags set on this variable for super-saturation and dew point depression, which flag a large number of observations in the USA and Europe. Although more than e.g. the streak check, these are again linked to the long station records in these regions. However, it may be necessary to adjust the thresholds in these tests in future processing runs. The sea-level pressure shows less flagging overall, and here it is the northern hemisphere more widely which has a high set of flag counts. Station level pressure has a larger flagging rate, with some stations in South America, Africa and China standing out, as well as Germany and parts of Africa for the logic checks. Streaks in the wind speed over India and Bangladesh, as well as parts of Europe are flagged more highly, and wind direction over Europe and eastern North America. We note that an issue has been found in the encoding of calm periods in the sub-daily USAF data underlying the ISD, which has arisen as part of their encoding work (see Dunn et al, 2022 for more details). The resulting missing calm periods will also be in these data while we await the correction to the past data by the USAF. The simple correction applied to HadISD as noted in Dunn et al, 2022 is not possible for these data as we do not have the metadata flags to work from and also shouldn’t be applied to this more fundamental climate record because of the risk of introducing other errors. The wind checks also flag many wind direction values, likely because of the convention applied therein of calm winds having a direction of 0°N, whereas northerlies being ascribed as 360°N (which is assigned a separate flag letter to enable these to be separated out from other wind flags). As no data values are removed by these tests, users can select to ignore the wind flags if this is appropriate for their application.
Figure 15: Top – Total flag counts for temperature, Bottom flag counts for streak check.
9. Details of the Seventh full data release
The seventh full data release (r7) was completed November 2024. Releases build over time by adding both data volume and data quality indicators. They are envisaged to conform to the current data format specification. However, should user feedback necessitate it, revisions to the data format will be undertaken. Users are strongly encouraged to test this data release and provide constructive feedback so any changes can be incorporated and communicated in the subsequent releases. Feedback is encouraged as outlined in Section 10.
9.1. Sub daily data
For the current data release we have merged the USAF sub-daily data with the following data sources:
- The International Surface Pressure Databank (ISPD)
- The "TD-13" dataset
- NOAA CDMP dataset
- Underlying data sources to the UERRA regional reanalysis for Europe.
- Data from the UK Met Office Daily Weather Reports, data for Europe.
- Data from the University of Giessen containing data for India.
- University of Bern, (CHIMES project) data for Switzerland.
- The Central Institution for Meteorology and Geodynamics (ZAMG) data for Austria.
- University of Witwatersrand, data for South Africa.
- Chile Met Service, data for Chile.
- Data from the University of Giessen containing data for Australia
- National Climate Centre (CMA ISPD) data for China.
- Climate Science for Service Partnership China (CCSP) data for India and Sri Lanka.
- Data from ACRE African stations late 19th Century.
- Meteo-Lux data for Luxembourg.
- Data from ACRE for the Solomon Islands.
- Data from Environment & Climate Change Canada (ECCC).
- Data from (NCAR/RDA) for Greenland and Iceland.
- Data from (NCAR/RDA) Brazilian Air Ministry for Brazil.
- Data from (NCAR/RDA) Australia Summary of Day and Surface Observations for Australia.
- Data from (NCAR/RDA) for Mexico.
- Data from (NCAR/RDA) DSI-3280 for United States and others.
- NOAA/NCEI The Coastal-Marine Automated Network (CMAN) USAF data.
- INMET Brazilian met service
- Israel Met Services
- Met Eireann
- Deutscher Wetterdienst (DWD)
- Met NO (Norway Met Service)
- UKMO Climatological Stations weather rescue
- Finland met service
- PROMICE and GC-Net automated weather station data in Greenland
- Icelandic Met Office :CARRA-Iceland project data
- Stephen Burt Durham Observatory weather rescue
- Edward Hanna SCILLY WEATHER RESCUE
- UKMO MIDAS weather rescue
- UK-DATA-RESCUE-STATIONS
- PALAEO-RA_PALANTINA
- Icelandic Met Office :CARRA-Iceland_project_data
- Met NO (Norway Met Service)
- SMHI Swedish Met Service
- Belgium Met Service
- Polish Met Service
- Meteo France
- AEMET Spanish Met Service
The USAF sub daily stations have retained the transmitted location metadata associated with each individual observation report. However, the USAF recommend use of their master chronology. For many stations, there exist multiple variations in location and elevation information in the observation report entries throughout the operational period. It is unclear whether these variations relate to real geophysical location moves or rather changes in surveyed location, fat finger typographical errors, changes in reporting of location precision etc. There are also stations that have very few reports or contain obviously inconsistent data. Therefore, it was deemed essential to filter out stations with bad data and suspicious within-report location information so as to minimise the chances of having to remove stations from subsequent releases.
Retaining solely stations that had no within-report geolocation ambiguity resulted in an overly restrictive selection being retained. Thus, instead that subset with reasonable consistency in geolocation within the reports were retained at the present time. The sub-set of USAF stations that were retained for the present release passed the following criteria:
- The maximum range in all the latitude and longitude listings of the stations had to be less than or equal to 40 km. This means the station representativity aspects of record homogeneity, even if it has truly moved one or more times, should be relatively manageable for most applications.
Table 10 shows the potential USAF sub-daily stations by platform types that are available from the extracted sub-set as mentioned in section 4.3. However, based on the selection criteria outlined previously 17,563 sub-daily stations of the 24,425 have been selected for the current full data release.
The team are currently working on methods to try and reconcile some of these location issue within the current withheld USAF stations.
Table 10: Number of potential USAF sub-daily stations and the number of stations selected for the fourth full data release based on the criteria mentioned in this section.
Platform type | No. of potential USAF stations Available | No. stations selected for current full data Release |
AFWA | 2,707 | 2313 |
ICAO | 7,730 | 5870 |
WMO | 13,405 | 8875 |
CMAN's | 573 | 504 |
The current full public data release consists of 29,385 stations an increase of 4,260 stations on the previous public data release, using modified methods as described in Menne et al. (2012). The previous full data release had a temporal coverage of stations to cover 1790 -2023. The current full data release covers 1790-2024
The stations in the current release are distributed by ECVs as follows:
- 26,320 sub-daily stations consisting of Temperature observations (increase of 6,226 stations on previous release).
- 17,998 sub-daily stations consisting of Wind direction and Wind speed observations (increase of 2,146 stations on previous release).
- 22,179 sub-daily stations consisting of Water vapour observations (increase of 4,462 stations on previous release).
- 19,818 sub-daily stations consisting of Sea Level Pressure observations (increase of 1,582 stations on previous release).
- 18,567 sub-daily stations consisting of Station level Pressure observations (increase of 3,131 stations on previous release).
We have chosen to exclude accumulated precipitation observations from the current release until we learn more about how this variable has been reported in the USAF data files. The USAF data fields have up to 9 different fields pertaining to precipitation and our knowledge of these fields is limited at this time. In addition, there are considerable quality control issues with sub-hourly and sub-daily precipitation observations that will need to be checked and understood. We are prioritizing precipitation for inclusion in subsequent data releases but will only do so when the scientific understanding permits and the QC routine has been produced.
9.2. Sub daily station data policy
We have been working to ascertain and verify data policy for the sub-daily data sources. The sub-daily data for the current data release consist of 118 different merged sources. So far, we have been able to verify that 82 sources have open data policy (Data in public domain and freely available (no cost and unrestricted), 23 of the sources have an open Creative Commons (CCBY) License (https://creativecommons.org/licenses/) and 28 sources are deemed to be WMO Resolution 40 Additional Data as these data are openly available from publicly facing repositories but data policy could not be verified at this time. Of these sources 15 have mixed data policies due to known national data policies being applied.
Figure 16a shows the location of the sub-daily stations with observations that are open access or open CCBY data policy. Figure 16b shows all sub-daily stations that are deemed to have observations under WMO unified. Data policy. There are 19,465 stations with observations under open data policy/open CCBY license and 12,450 stations with observations under WMO unified data policy in the current sub-daily data release. Some sources also have data from multiple countries and although the data source may have a WMO unified data policy the National Data Policy will supersede this policy, therefore we have changed the data policy to open access/ open CCBY license for those stations located in the countries listed in Table 11.
Table 11: National data policy for station included in the seventh data release.
Country | Institute | Data policy link | Data policy |
Finland | Finnish Met Institute | Creative Commons Licence | |
Germany | DWD | https://opendata.dwd.de/climate_environment/CDC/Terms_of_use.pdf | Open Access |
Ireland | Met Éireann | Creative Commons Licence | |
Luxembourg | Meteolux | https://www.meteolux.lu/fr/aide/aspects-legaux/?lang=fr, no additional data - https://community.wmo.int/notifications | Creative Commons Licence |
Netherlands | KNMI | Creative Commons Licence | |
Norway | MET Norway | https://www.met.no/en/free-meteorological-data/licensing-and-crediting | Creative Commons Licence |
Sweden | SMHI | https://www.smhi.se/omsmhi/policys/datapolicy/mer-och-mer-oppna-data-1.8138 | Creative Commons Licence |
United States | NOAA NCEI | https://www.ncdc.noaa.gov/wdcmet, https://obamawhitehouse.archives.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf | Open Access |
Canada | Environmental Canada | Open Access | |
Iceland | Icelandic Met Office | Open Access | |
Hungary | Hungarian Met Office | Open Access | |
France | Meteo France | https://meteo.data.gouv.fr/datasets/6569b4473bedf2e7abad3b72 | Open Access |
Spain | AEMET | Open Access | |
UK | Met Office | Open Access |
Figure 16: (a) Location of sub-daily station with observations under open data policy and CCBY licence and (b) stations with observations under WMO unified data policy.
9.3. Daily data
The daily data for the present release consists of a subset of stations extracted from NOAA's National Centre for Environmental Information (NCEI) Global Historical Climatological Network Daily (GHCN-D). The GHCN-D database consists currently of in excess of 120 thousand stations, although many of these are precipitation only stations. Given the stated aim for a multivariate set of holdings, the current release does not include the precipitation-only stations. The subset of 86,513 global daily stations were selected for the current public data release from the GHCN-D dataset based on the availability of at least two of our target ECVs. This represents a slight decrease of 32 stations on the previous public data release. Figure 17 shows the location of all the daily stations in the current release.
The daily stations in the current release are distributed by ECVs as follows:
- 86,458 daily stations consisting of Precipitation observations.
- 70,998 daily stations consisting of Snowfall observations.
- 62,436 daily stations consisting of Snow Depth observations.
- 50,786 daily stations consisting of Temperature observations.
- 20,768 daily stations consisting of Snow Water Equivalent observations.
- 1,197 daily stations consisting of Wind Speed observations.
- 82 daily stations consisting of Wind Direction observations.
The daily stations in the current release are distributed by WMO Region as follows:
- 689 daily stations located in WMO Region 1 (Africa).
- 1299 daily stations located in WMO Region 2 (Asia).
- 446 daily stations located in WMO Region 3 (South America).
- 76,087 daily stations located in WMO Region 4 (North America, Central America and the Caribbean).
- 2,005 daily stations located in WMO Region 5 (South-West Pacific).
- 5,942 daily stations located in WMO Region 6 (Europe).
- 52 daily stations located in WMO Region 7 (Antarctica).
The subset of GHCN-D stations selected for the present release contain stations that have data merged from multiple sources (Section 6).
Figure 17: Map shows locations of daily stations for the current release.
9.4. Daily data policy
The daily data for the current data release consist of 30 different merged sources. So far, we have been able to verify that 19 sources have open data policy or CCBY data policy (Data in public domain and freely available (no cost and unrestricted)) and 11 sources are deemed to be WMO Unified Data Policy Additional Data as these data are openly available from publicly facing repositories but data policy could not be verified at this time. In addition, there are 5 of the daily sources with mixed data policy with some stations under open access/ open CCBY license and others under WMO unified data policy. Figure 18a shows the location the daily stations with observations that are open or CCBY data policy. Figure 18b shows all daily stations that are deemed to have observations under WMO unified data policy. There are 74,926 stations with observations under open/ open CCBY license data policy and 11,629 stations under WMO unified. Data policy in the current daily data release. Due to the station merge/mingle process there are some stations in this current release that have mixed data policy with segments of observations that may be open access/ open CCBY license and other segments that are WMO unified data policy. As stated previously some daily sources also have data from multiple countries and although the data source may have a WMO unified data policy the National Data Policy will supersede this policy, therefore we have changed the data policy to open access/ open CCBY license for those stations located in the countries as listed in Table 11.
Figure 18: (a) Location of daily station with observations under open data policy and CCBY licence and (b) stations with observations under WMO unified data policy.
9.5. Monthly data
The same selected subset of GHCN-D daily stations were extracted from NCEI's Global Summary Of the Month (GSOM) dataset. There are 118,276 monthly stations in the current GSOM release. The Global Summary of the Month (GSOM) and Global Summary of the Year (GSOY) datasets consist of 55 climatological variables (many being derived quantities such as heating degree days and growing degree days) computed from summary of the day observations of the Global Historical Climatology Network Daily dataset. Full documentation can be found at: https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00946.
Data which are flagged as part of GHCN-Daily quality control processes (Durre et al., 2010) are excluded from summary of the month computations. Thresholds were established for the number of missing or flagged values allowed in the computation of a monthly value.
Details of the GSOM variables included in the current monthly data release and how the values were computed from the GHCN-D are as follows:
- Monthly Maximum Temperature: Average of daily maximum temperature; computed to hundredths degree Celsius. Values are set to missing if more than 5 daily values are missing or flagged or if more than 3 daily values in a row (consecutive) in a given month are missing or flagged.
- Monthly minimum Temperature: Average of daily minimum temperature; computed to hundredths degree Celsius. Values are set to missing if more than 5 daily values are missing or flagged or if more than 3 daily values in a row (consecutive) in a given month are missing or flagged.
- Monthly average Temperature: computed by adding the unrounded monthly mean TMAX (average of the daily maximum temperatures) and TMIN temps (average of the daily minimum temperatures) and dividing by 2; then round to hundredths degree Celsius. Values are set to missing if either the monthly mean TMAX or TMIN temperature is missing.
- Total Monthly Precipitation: Precipitation totals are based on daily or multi-day (if daily is missing) precipitation report, in millimeters to tenths. Values are set to missing if more than 5 daily values are missing or flagged or if more than 5 daily values in a row (consecutive) in a given month are missing or flagged.
- Total Monthly Snowfall: Snowfall totals are based on daily or multi-day (if daily is missing) snowfall report, in millimeters to tenths. Values are set to missing if more than 5 daily values are missing or flagged or if more than 5 daily values in a row (consecutive) in a given month are missing or flagged.
- Monthly average Wind Speed: Average the average daily wind speed values in GHCN-D to get monthly and annual averages. (tenths of meters per second). Values are set to missing if more than 5 daily values are missing or flagged or if more than 3 daily values in a row (consecutive) in a given month are missing or flagged.
There are 83,128 monthly stations extracted from the GSOM dataset that match the GHCN-D daily stations for the present release. Figure 19 shows the locations of all the monthly stations for the current release.
The stations are distributed by ECVs as follows:
- 82,936 monthly stations consisting of Precipitation observations.
- 37,148 monthly stations consisting of Temperature observations.
- 65,886 monthly stations consisting of Snow observations.
- 1,202 monthly stations consisting of Wind Speed observations.
The stations are distributed by WMO Region as follows:
- 658 monthly stations located in WMO Region 1 (Africa).
- 1,294 monthly stations located in WMO Region 2 (Asia).
- 423 monthly stations located in WMO Region 3 (South America).
- 72,774 monthly stations located in WMO Region 4 (North America, Central America and the Caribbean).
- 1,996 monthly stations located in WMO Region 5 (South-West Pacific).
- 5,933 monthly stations located in WMO Region 6 (Europe).
- 50 monthly stations located in WMO Region 7 (Antarctica).
Figure 19: Maps shows locations of monthly stations for the current release
9.6. Monthly data policy
The breakdown for monthly stations data policy is similar to the daily stations outlined in section 9.4.
9.7. Details of dealing with additional information on station merges from multiple sources and multiple station locations
As mentioned previously some stations in GHCN-D and the current sub-daily data release are made up of data from the same station but from multiple sources. In addition, some of the sub-daily stations have multiple independent location and / or elevation information in their associated reports over their operational lifetime. This information needs to be retained and made available to users. We are currently only serving the CDM_Lite version to users but in the future the full CDM table will be made publicly available via the CDS. The following section describes how the full CDM provides this additional information. Additional information can be provided to users upon request (see Section 10).
Figure 20 shows a snapshot of the station_configuration table showing a sub-daily station that has varying station location metadata over the operational period. In this case we have entered the primary_id and allocated a record_number to each station occurrence associated with a different
location/elevation. The most likely station latitude/longitude/elevation information derived from the USAF or source master metadata chronology file is allocated to all the station occurrences with the same primary id. A value of 1 in the optional_data field indicates that there is additional station metadata information that may be important to users, such as varying historical location information (station moves). This indicates that a user can go to the station_configuration_optional table to view the additional metadata for each station configuration (see Figure 22). Where the primary_id is the link to station for which this entry corresponds, the record_number links to the station configuration for which this entry corresponds, the kind field indicates the enumerated data type (e.g. 0=integer, 1=numeric, 2=varchar and 3=timestamp with timezone).The field entry code describes the additional data entered into the value field (e.g. 26=alternative latitude, 25=alternative longitude and 29=alternative elevation). The comments field can contain additional information about the entry (see Figure 21).
In the example given in Figures 20 and 21 the interpretation would be that there are 3 unique locations associated with station identifier WMO_01003 within the individual reports. The first location, and that directly associated with all occurrences of an observation, is 77 degrees North, 15.5 degrees east. However, only that set of observations associated with record_number 1 truly arise at that location. The second location differs in that the longitude is 15.55 degrees East (varying by less than the 0.2 degrees threshold). All observations associated with record_number 2 have this value in the original associated report. The final version differs in both latitude (77.0017) and longitude (15.5348), again both variations lying well within the stated tolerance for inclusion in the present release. All occurrences with record_number 3 share this set of coordinates in the original reports. All three versions agree that the elevation is 12m a.s.l. Note that the occurrences of locations sometimes overlap (record number 1 and 3) and sometimes are distinct (2 occurs in a block distinct from 1 and 3). This may point toward real relocations or simply corrections / adjustments to locations used in message transmission. Much further work is required still to disentangle issues, and this is why the selection criteria outlined in Section 9.1 were applied.
Figure 20: Snapshot of station configuration table for WMO USAF sub daily stations.
Figure 21: Snapshot of station configuration optional table
Figure 22 shows a snapshot of the station_configuration table for a station that consists of station data merged from distinct primary sources. In this case we have entered the primary identifier (primary_id) and allocated a (record_number) to each different station source used in the data merge. The primary station location information is used for all the different station record numbers regardless of whether there are differences between sources. The different record numbers correspond to the different station configurations and subsequent data sources. This allows for the different data sources for each station to be defined in the source_id field. We have also retained the original identifier (secondary_id) which was allocated by each data source. If additional metadata was available for any station a value of 1 would have been entered in the optional_data field. A user could look up the station_configuration_optional table to view the additional metadata as seen in Figure 22.
A user can thus associate each observation with the primary source and if that primary source metadata differs from the applied geolocation ascertain the nature of the differences.
Figure 22: Snapshot of station configuration table showing the different configurations for a station that consist of merges from multiple sources.
Some GHCND merge decisions involve merges of stations within a source. In such a case (most prevalently GSOD) there will exist multiple versions of the station arising from the single source in question. These are differentiated by the secondary_id which points to the mingling of two records from within a single source. In addition, for the ECA&D source (and uniquely for that source presently) each element is associated with a unique identifier for historical reasons such that for an ECA&D station reporting for example maximum and minimum temperatures and precipitation it will have three identifiers in that source.
9.8. Details on data numerical precision.
We are currently only serving the CDM_Lite version to users but in the future the full CDM table will be made available publicly available via the CDS. It is worth noting that there are fields in the full CDM observations table that require an entry showing the numerical precision of the observed value both original and converted (numerical_precision, original_precision). The original precision value entered into the CDM is the actual numerical precision of the observed value. When we convert a Celsius value of say 13 Celsius to Kelvin we are adding 273.15 (the internationally adopted conversion factor fixed by the world governing body of metrology – the Bureau International des Poids et Mesures (BIPM)) so the observed value is now 286.15 - the original value has a precision of 1., and the converted value now has an apparent precision of 0.01. This appears that we are increasing the implied precision, but we are reporting the precision of the data not the instrument and measurement. The full details of the conversions applied are encoded within the CDM and associated on an observation-by-observation basis such that the user can appropriately convert for their application as required.
9.9. Details on timely updates to current data release
We are updating the daily data in the current release by downloading the Superghcnd_diff_yyyymmdd_to_yyyymmdd.tar.gz file every morning at 3.00am from the GHCND FTP data access site. Once downloaded code runs to automatically process the file into the CDM formatted file and the daily update value will be appended to the current daily data release.
The .tar.gz file consists of a three .csv files :
- delete.csv file which contains daily values that were present on yyyymmdd1 , but not on yyyymmdd2 (i.e., have been removed from the yyyymmdd1 version of the dataset as of yyyymmdd2).
- insert.csv file that contains values that were new on yyyymmdd2 (i.e., that were not yet available on yyyymmdd1, but were newly available on yyyymmdd2).
- update.csv which contains changes to values or flags that were present on yyyymmdd1 (i.e., these values or flags have been altered between yyyymmdd1 and yyyymmdd2).
The daily updated values are appended from the insert.csv to the current data release so that the deep past values are not changed in the current version of the data release. Once a full new version is processed and released all the changes that are present in the update and delete files will be updated. The timely updates are scheduled for each day, but sometimes there may be a delay in the GHCNd feed of 1-5 days. The updates will be appended on a best endeavours basis and should not be considered an operational component of the service. In the event of a failure of daily updates all affected data can be retrospectively processed and updated.
The monthly data is currently updated each month and the previous month's values will be appended to the current data release on or around the middle of each month. There are plans to implement sub-daily timely updates over the current contract, but the details are still being finalised.
10. User feedback
Users should provide feedback to ECMWF Support to enable tracking of all user queries and suggestions. This helps the C3S service to comprehensively understand both usage but also user requirements and hence helps shape the future priorities for the C3S service as a whole. Users are encouraged to raise both data queries and service feedback via this mechanism. Any query should clearly identify the name of the dataset ("Global land surface atmospheric variables from comprehensive in-situ observations") to ensure timely routing of queries to the team.
References
{C3S_D311a_Lot2.2.1.1_201708_Initial_specification_for_CDM_v1}. Available upon request (see Section 10)
{C3S_D311a_Lot2.3.2.1_201712_Specification_of_test_data_delivery_service.v1} Available upon request (see Section 10)
Afifi A. A. and Azen, S. P.: Statistical Analysis: A Computer Oriented Approach, 2nd Edn., Academic Press, Inc. Orlando, FL, USA, 1979.
Allan, R., P. Brohan, G. P. Compo, R. Stone, J. Luterbacher and S. Brönnimann, 2011: The international Atmospheric Circulation Reconstructions over the Earth (ACRE) initiative. Bulletin of the American Meteorological Society, 92, 1421–1425.
Bojinski, S., M. Verstraete, T.C. Peterson, C. Richter, A. Simmons, and M. Zemp, 2014: The Concept of Essential Climate Variables in Support of Climate Research, Applications, and Policy. Bull. Amer. Meteor. Soc., 95, 1431–1443, https://doi.org/10.1175/BAMS-D-13-00047.1
DeGaetano, A. T.: A quality-control routine for hourly wind obser-vations, J. Atmos. Ocean. Tech., 14, 308–317, 1997.
Dunn, R. J. H., et al. (2022) Reduction in reversal of global stilling arising from correction to encoding of calm periods Environ. Res. Commun. 4 061003
Dunn, R. J. H., et al. (2012), HadISD: A Quality Controlled global synoptic report database for selected variables at long-term stations from 1973-2011, Climate of the Past, 8, 1649-1679
Dunn, R. J. H., Willett, K. M., Morice, C. P., and Parker, D. E.: (2014) Pairwise homogeneity assessment of HadISD, Clim. Past, 10, 1501-1522, https://doi.org/10.5194/cp-10-1501-2014,
Durre, I., M. J. Menne, B. E. Gleason, T. G. Houston, and R. S. Vose, 2010: Comprehensive automated quality assurance of daily surface observations. J. Appl. Meteor. Climatol., 49, 1615–1633, doi:10.1175/2010JAMC2375.1.
GCOS, 2010: Guide to the GCOS surface network (GSN) and GCOS upper-air network (GUAN): 2010 update of GCOS-73 (WMO/TD No. 1558, GCOS No. 144). Geneva: World Meteorological Organization. Available at: https://library.wmo.int/doc_num.php?explnum_id=3855
GCOS, 2015: Status of the Global Observing System for Climate (GCOS 195). Geneva: World Meteorological Organization (WMO). Available at: https://library.wmo.int/pmb_ged/gcos_195_en.pdf
Jones, P. D., 2016: The reliability of global and hemispheric surface temperature records. Advances in Atmospheric Sciences, 33 (3), 269-282.
Lawrimore, Jay H., M. J. Menne, B. E. Gleason, C. N. Williams, D. B. Wuertz, R. S. Vose and J. Rennie, 2011: An overview of the Global Historical Climatology Network monthly mean temperature data set, version 3. Journal of Geophysical Research – Atmospheres, 166, D19121.
Menne, M.J., I. Durre, R.S. Vose, B.E. Gleason, and T.G. Houston, 2012: An Overview of the Global Historical Climatology Network-Daily Database. J. Atmos. Oceanic Technol., 29, 897–910, https://doi.org/10.1175/JTECH-D-11-00103.1
Noone, S, Atkinson, C, Berry, DI, et al. Progress towards a holistic land and marine surface meteorological database and a call for additional contributions. Geosci Data J. 2021; 8: 103– 120. https://doi.org/10.1002/gdj3.109
Parker, D. E., 1994: Effects of changing exposure of thermometers at land stations. International Journal of Climatology, 14 (1), 1-31.
Quayle, R. G., D. R. Easterling, T. R. Karl and P. Y. Hughes, 1991: Effects of recent thermometer changes in the Cooperative Station Network. Bulletin of the American Meteorological Society, 72 (11),1718-1724.
Thorne, P. W., H. J. Diamond, B. Goodison, S. Harrigan, Z. Hausfather, N. B. Ingleby, P. D. Jones, J. H. Lawrimore, D. H. Lister, A. Merlone, T. Oakley, M. Palecki, T. C. Peterson, M. de Podesta, C. Tassone, V. Venema and K. M. Willett, 2018: Towards a global land surface climate fiducial reference measurements network. International Journal of Climatology, 38, 2760-2774.
Thorne, P.W., R.J. Allan, L. Ashcroft, P. Brohan, R.J. Dunn, M.J. Menne, P.R. Pearce, J. Picas, K.M. Willett, M. Benoy, S. Bronnimann, P.O. Canziani, J. Coll, R. Crouthamel, G.P. Compo, D. Cuppett, M. Curley, C. Duffy, I. Gillespie, J. Guijarro, S. Jourdain, E.C. Kent, H. Kubota, T.P. Legg, Q. Li, J. Matsumoto, C. Murphy, N.A. Rayner, J.J. Rennie, E. Rustemeier, L.C. Slivinski, V. Slonosky, A. Squintu, B. Tinz, M.A. Valente, S. Walsh, X.L. Wang, N. Westcott, K. Wood, S.D. Woodruff, and S.J. Worley, 2017: Toward an Integrated Set of Surface Meteorological Observations for Climate Science and Applications. Bull. Amer. Meteor. Soc., 98, 2689–2702, https://doi.org/10.1175/BAMS-D-16-0165.1
WMO, 1996: Exchanging Meteorological Data: Guidelines on Relationships in Commercial Meteorological Activities – WMO Policy and Practice (WMO Publication No. 837). Geneva: World Meteorological Organization (WMO). Available at: https://www.wmo.int/pages/about/documents/WMO837.pdf
WMO, 2014a: Guide to Meteorological Instruments and Methods of Observation (WMO Publication No. 8). Geneva: World Meteorological Organization (WMO). Published in 2014, updated in 2017. Available at: http://www.wmo.int/pages/prog/www/IMOP/CIMO-Guide.html
WMO, 2014b: Manual on Codes, Volume I.1 - Part A - Alphanumeric Codes. 2011 edition, updated in
Yin X, Gleason BE, Compo GP, Matsui N, Vose RS, 2008: The International Surface Pressure Databank (ISPD) land component version 2.2. National Climatic Data Center, Asheville, NC. Available from ftp://ftp.ncdc.noaa.gov/pub/data/ispd/doc/ISPD2_2.pdf
Appendix
Table A. Links to comma-separated value station inventory files featuring station availability.
Data release | CDS dataset version | Temporal aggregation | Link |
---|---|---|---|
R2 | 1.0.0 | Sub-daily | Link |
R2 | 1.0.0 | Daily | Link |
R2 | 1.0.0 | Monthly | Link |
R7 | 2.0.0 | Sub-daily | Link |
R7 | 2.0.0 | Daily | Link |
R7 | 2.0.0 | Monthly | Link |
Table B. Links to pipe-separated value files showing the acknowledgement to be used for each source.
Dataset release | CDS dataset version | Link |
---|---|---|
R7 | 2.0.0 | Link |
Table C: Column headers for the source deck inventory and the corresponding explanation.
Column number | Column Header | Description | Notes |
1 | ver | Inventory version | Preliminary |
2 | source_uid | Allocated Unique source identifier | |
3 | data_repository | Name of data set repository | |
4 | Originators_dataset_name | Originators Data set name | e.g antarctic_palmer |
5 | dataset _name | Dataset name given by C3S311a_Lot2 team. | Combination of data source name, data repository and domain. e.g amrc_palmer_isti_antarctic |
6 | method of data transfer | Details of method of data transfer into C3S311a Lot 2 via FTP, disk, or email etc. | |
7 | source_name | Original Data source name and address | |
8 | domain | Data domain coverage | (Global, European, Asia, Region or Country) |
9 | lon_min | minimum longitude | -180 to 180 |
10 | lon_max | maximum longitude | -180 to 180 |
11 | lat_min | minimum latitude | -90 to 90 |
12 | lat_max | maximum latitude | -90 to 90 |
13 | bbox | co-ordinates of bounding box (latitude- longitude): rectangle, line, or point. | POLYGON ((x1 y1, x2 y2, |
14 | source_data_policy | Status of data usage policy | (101=Open access 102= WMO Resolution 40 103= Restricted 999=Unknown/na) |
15 | var_attained | Variables that have been obtained and are | Hourly Temperature observations (sub-daily) |
16 | other_var_not obtained | other variables potential available from this | see above list |
17 | t_step | timestep of data in the source deck | M=monthly, DY=Daily, SDY=Sub-daily |
18 | data_first_year | The first year of available data | |
19 | data_last_year | The last year of available data | |
20 | data_mean_years | Mean number of data years available | |
21 | data_update_status | The frequency of source data updates | 101=annual update,102= frozen data set 103= monthly 104=daily 105=weekly 106=real time NA=not available, unknown |
22 | proc_status | processing status within C3S311a Lot 2 | Preliminary, V.02, V.03 etc |
23 | pointOfContact_ | The data source organization or institute to | |
24 | pointOfContact_ | Name of person to contact | |
25 | pointOfContact_ | Position of point of contact | |
26 | pointOfContact_ | The mail address or if not available an email | |
27 | principalInvestigator | Name of principle investigator of data | |
28 | principalInvestigator_ | Address or email of principle investigator | |
29 | principalInvestigator_ | Any online resource links available for |
Table D: Station deck inventory and details of mapping onto ISO19115 and WMO/WIGOS compliant standards tables
Column Number | Station Deck inventory | ISO19115 Table | WMO/WIGOS Table |
1 | station_id | identifier | |
2 | source_uid | ||
3 | source_name | ||
4 | descriptionDataset | descriptionDataset | |
5 | data_repository_ftp | ||
6 | wmo_id | ||
7 | station_name | stationPlatformName | |
8 | lat | latitude | siteInformation |
9 | lon | longitude | siteInformation |
10 | elev | ||
11 | fips_code | ||
12 | country | ||
13 | continent | ||
14 | station_data_policy | accessConstraints | dataPolicyUseConstraints |
15 | station_data_type | ||
16 | data_update_status | ||
17 | station_metadata_link | ||
18 | timestep | ||
19 | ver | ||
20 | obs_freq | ||
21 | obs_time_GMT | ||
22 | Hourly_Temperature _start_year | ||
23 | Hourly_Temperature_end_year | ||
24 | Daily_mean_temperature _start_year | ||
25 | Daily_mean_temperature _end_year | ||
26 | Daily_maximum_temperature _start_year | ||
27 | Daily_maximum_temperature _end_year | ||
28 | Daily_minimum_temperature _start_year | ||
29 | Daily_minimum_temperature_end_year | ||
30 | Accumulated_precipitation _start_year | ||
31 | Accumulated_precipitation _end_year | ||
32 | Sunshine_Duration_start_year | ||
33 | Sunshine_Duration_end_year | ||
34 | Wind_speed _start_year | ||
35 | Wind_speed _end_year | ||
36 | Wind_direction_start_year |
Table E: Example tabulation of IPR table related aspects of the ISO-19115 standard and is filled with an example of a data source.
Citation | |
Title | 1000151_ Porto_Geophysical_Institute_and_Lisbon_Geophysical_Institute_Publications_and_Instituto_de_Meteorologia_digital_database (ERACLIM) |
Alternate title | valente_ERACLIM_portugal |
Date | 17/08/2017 |
Edition | |
Edition date | 17/08/2017 |
Citation identifier | |
Cited responsible party | |
Presentation form | |
Series | |
Other citation details | none |
ISBN | |
ISSN | |
Website | |
Abstract | |
Credit | |
Status | Ongoing |
Point of contact | |
Responsibility | |
Role | Principal investigator |
Party | |
Organisation | |
Name | Antonia Valente Porto_Geophysical_Institute_and_Lisbon_Geophysical_Institute_Publications_and_Instituto_de_Meteorologia_digital_database (ERACLIM) |
Contact Information | |
Contact | mavalente@fc.ul.pt |
Telephone | |
Address | |
Delivery point | |
City | |
Administrative area | |
Postal code | |
Country | 3-02-138–PRT Porto_Geophysical_Institute_and_Lisbon_Geophysical_Institute_Publications_and_Instituto_de_Meteorologia_digital_database |
Electronic mail address | mavalente@fc.ul.pt |
Addressee | |
Online resource | |
Contact instructions | |
Contact type | |
Contact Information | |
Logo | |
Individual | |
Name | |
Contact Information | |
Position name | |
Responsibility | |
Role | |
Party | |
Organisation | |
Name | |
Contact Information | |
Contact | |
Telephone | |
Address | |
Delivery point | |
City | |
Administrative area | |
Postal code | |
Country | |
Electronic mail address | |
Addressee | |
Online resource | |
Hours of service | |
Contact instructions | |
Contact type | |
Contact Information | |
Logo | |
Individual | |
Name | |
Contact Information | |
Position name | |
Resource constraints | |
Legal constraints | |
Use limitation | Open Access |
Constraint application scope | |
Graphic | |
Reference | ECMWF provides open access to ERA-CLIM reanalysis products and other research datasets at http://apps.ecmwf.int/datasets. |
Releasability | |
Party responsible | |
Access constraints (options are: ConfidentialCopyrightIn-ConfidenceIntellectual property rightsLicence DistributorLicence End UserLicence UnrestrictedLicenseOther restrictionsPatentPending patentPrivateRestrictedSensitive But UnclassifiedStatutoryTrademarkUnrestricted) | Unrestricted |
Use constraints (options are: ConfidentialCopyrightIn-ConfidenceIntellectual property rightsLicence DistributorLicence End UserLicence UnrestrictedLicenseOther restrictionsPatentPending patentPrivateRestrictedSensitive But UnclassifiedStatutoryTrademarkUnrestricted) | Unrestricted |
Other constraints | Data from this original source were received from Maria Antonia Valente at IDL, University of Lisbon, Portugal (ERA-CLIM). This data are open an freely available. |
Resource constraints | |
Associated resource | |
Distribution Information | |
Distribution | Data were downloaded directly via a link provided by Antonia Valente |
Description | |
Distribution format | CSV |
Distributor | Data were downloaded directly via a link provided by Antonia Valente |
Transfer options | |
Metadata identifier | |
Identifier | |
Version | |
Description | |
Point of contact | |
Responsibility | |
Role | |
Party | |
Organisation | |
Name | |
Contact Information | |
Contact | |
Telephone | |
Address | |
Delivery point | Copernicus Climate Date Store |
City | |
Administrative area | |
Postal code | |
Country | Ireland |
Electronic mail address | |
Addressee | |
Online resource | |
Hours of service | |
Contact instructions | |
Contact type | |
Contact Information | |
Logo | |
Individual | |
Name | Simon Noone, Irish Climate Analysis and Research Units (ICARUS), Room 4.7, Laraghbryan House, North Campus, Maynooth University, Maynooth, Kildare |
Contact Information | simon.noone@mu.ie |
Position name | |
Contact | |
Parent metadata | |
Type of resource | |
Hierarchy level | |
Resource scope | |
Name | |
Type of resource | |
Alternative metadata reference | |
Metadata linkage | |
Metadata linkage | |
Date | |
Date info | |
Metadata standard | ISO19115-3 |
Citation | |
Title | |
Alternate title | |
Date | |
Edition | |
Edition date | |
Citation identifier | |
Cited responsible party | |
Presentation form | |
Series | |
Other citation details | |
ISBN | |
ISSN | |
Website |