You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Table of Contents

Target audience

Access to ARCO data is programmatic; therefore, users of these resources and this documentation are expected to have some relevant programming experience.

The Data Store Service (DSS) at ECMWF offers a subset of some of the data available in the Data Stores in an Analysis Ready, Cloud Optimised (ARCO) format. These ARCO data are typically provided as Zarr archives stored in S3-compatible object storage. 

DSS catalogue entries with data available in ARCO format provide specific guidance on how to access the associated Zarr assets.

This page provides general documentation on the ARCO resources, together with guidance and best practices for their effective and efficient use.

What is ARCO, and why should I use it?

Analysis Ready, Cloud Optimised (ARCO) encompasses several aspects of modern data management and delivery.

Fundamentally, ARCO is about transforming traditional datasets into formats that work natively in modern cloud-based workflows, including web applications and interactive analysis environments.

ARCO is particularly well suited to:

Applications that require downloading large volumes of data for repeated, offline processing may be better suited to traditional data access methods via the relevant Data Store request service (e.g. cdsapi).

Analysis Ready

“Analysis ready” means that the data can be used directly in downstream applications without additional preprocessing.

Specifically, this means that there is:

  • no need to decode packed variables
  • no need to apply scale factors or offsets
  • no need to interpret non-standard or obscure metadata representations
  • no need to merge multiple files before use

This significantly reduces the workload for downstream applications.

For web applications in particular, it can eliminate the need for an intermediate processing layer, reducing system complexity and improving performance and fidelity.

Cloud optimised

“Cloud optimised” means the data is structured to work efficiently over internet connections. Key characteristics include:

  • Support for parallel access: Multiple users or processes can read different parts of the data at the same time.
  • Minimised data transfer: Only the data needed for a task is downloaded, reducing bandwidth usage.
  • Lazy loading: Data is loaded only when it’s actually needed, not all at once.
  • Avoidance of large monolithic files: Data is split into smaller pieces instead of one huge file, making it easier to handle and transfer.

This is typically achieved by chunking the data. Chunking ensures that downstream applications only download the data required for a specific computation.

Data chunking

Data in Zarr archives is stored in chunks. A chunk is the smallest unit of data that can be transferred. Partial chunk downloads are not possible. Therefore, the chunking strategy used has a major impact on performance.

To address the two common, but opposing, usage patterns the DSS provides two versions of each Zarr archive: geo-chunked and time-chunked.

Geo-chunked

The geo-chunked archive version is chunked in the spatial dimensions and provides optimised access for long time series at a single point or small area.

For example, to produce time-series plots:


Time-chunked

The time-chunked archive version is chunked in the time dimensions and provides optimised access for large spatial regions, but for a short time period.

For example, to produce map plots:


Counter-intuitive chunking

The appropriate choice may appear counterintuitive:

  • For long time series at a single point or small area → use the geo-chunked Zarr archive.

  • For large spatial regions over short time periods → use the time-chunked Zarr archive.

Tokenised Access

Access to ARCO data is controlled via your Data Store Service credentials, i.e. using your API key as your Authorisation token.

The token must be included in the Authorization header of your HTTP requests. Examples demonstrating this are provided below for various access methods.

CDSAPI Key, ECMWF account and licencing

Your API key is available from your DSS profile page for one of the Data Stores you have registered with e.g. for the  CDS 

If you have not already registered with the Data Store Service, you must register an ECMWF account and use this to log in to one of the Data Store portals, e.g. the CDS or the ADS.

In addition to accepting the general DSS Terms and Conditions, you must also accept the licence associated with the dataset you are using from the relevant portal. Failure to accept the appropriate licence will result in authorisation errors.

Fair usage

Access to ARCO data is subject to fair usage policies.

Your access may be rate limited based on recent download volume. This is intended to prevent excessive or abusive use of the service.

The rate limiting mechanism is designed so that it does not impact typical usage patterns.

The specific parameters of the rate limiting are subject to change and are not publicly disclosed.

Access examples 

In this section we are provided examples to access ARCO data through various access methods such as Python and JavaScript.

Python: xarray

Requirements

You will need to install the following packages to access the Zarr archives with xarray, they can be installed with PyPi or conda.

requirements.txt
xarray
zarr
httpio
fsspec

Plug and play

The following example is a xarray get-started guide, however this will not suffice for any "heavy duty" or operational workflows. Please see the Advanced Usage below for more robust workflows mechanisms.

Access ERA5 single levels data with xarray
import os
import xarray as xr

# Get the cdsapi key from an environment variable, or specify explicitly:
cdsapi_key = os.getenv('CDSAPI_KEY', "<CDS-API-KEY>")

# Geo-chunked data for access optimised along the time dimension (e.g. for time-series at a single point)
geochunked_url = "https://arco.datastores.ecmwf.int/cadl-arco-geo-002/arco/reanalysis_era5_single_levels/sfc/geoChunked.zarr"

# Time-chunked data for access optimised across spatial dimensions (e.g. a global map plot for a single time-step)
timechunked_url = "https://arco.datastores.ecmwf.int/cadl-arco-time-002/arco/reanalysis_era5_single_levels/sfc/timeChunked.zarr"

# Open the geochunked_url with xarray, users must insert their CDS API key where indicated.
ds = xr.open_zarr(
    geochunked_url,
    consolidated=True,
     storage_options = {
        "headers": {"Authorization": f"Bearer {cdsapi_key}"}
    }
)

# Inspect the variables
print(ds)

Advanced Usage

The xarray interface to Zarr does not offer any retry mechanism as default. Given the nature of this remote access to data it is quite possible that larger workflows may result in a failed transfer of a data chunk for a number of possible reasons, e.g. a temporary loss in connectivity.

To make your workflows more robust, you can include a retry mechanism as part of your connection to the Zarr archives. Below are two examples using existing open-source libraries. 

obstore

Additional dependancies
pip install obstore
Access the ERA5 zarr data using an obstore custom store
import xarray as xr
from obstore.store import HTTPStore
from zarr.storage import ObjectStore

# Get the cdsapi key from an environment variable, or specify explicitly:
cdsapi_key = os.getenv('CDSAPI_KEY', "<CDS-API-KEY>")

# Geo-chunked data for access optimised along the time dimension (e.g. for time-series at a single point)
geochunked_url = "https://arco.datastores.ecmwf.int/cadl-arco-geo-002/arco/reanalysis_era5_single_levels/sfc/geoChunked.zarr"
 
# Use obstore's HTTPStore to create a store with retry configuration,
# and then wrap it in a zarr ObjectStore to read with xarray.
# See https://github.com/developmentseed/obstore/blob/main/obstore/python/obstore/_store/_retry.pyi
# for more details on the retry configuration options.
http_store = HTTPStore(
    geochunked_url,
    client_options={
        "default_headers": {"Authorization": f"Bearer {cdsapi_key}"},
    },
)
store = ObjectStore(http_store, read_only=True)
ds = xr.open_zarr(store)
print(ds)

aiohttp_retry

Additional dependancies
pip install aiohttp-retry
Access the ERA5 zarr data using an aiohttp retry-client
import xarray as xr
from aiohttp_retry import ExponentialRetry, RetryClient

# Get the cdsapi key from an environment variable, or specify explicitly:
cdsapi_key = os.getenv('CDSAPI_KEY', "<CDS-API-KEY>")

# Geo-chunked data for access optimised along the time dimension (e.g. for time-series at a single point)
geochunked_url = "https://arco.datastores.ecmwf.int/cadl-arco-geo-002/arco/reanalysis_era5_single_levels/sfc/geoChunked.zarr"

# Define a custom get_client function that returns a RetryClient with the
# desired retry configuration. See https://github.com/inyutin/aiohttp_retry
# for more details on the retry configuration options.
async def get_client(**kwargs):
    retry_options = ExponentialRetry()
    retry_client = RetryClient(
        **kwargs, raise_for_status=False, retry_options=retry_options
    )
    return retry_client


ds = xr.open_zarr(
    geochunked_url,
    storage_options={
        "headers": {"Authorization": f"Bearer {cdsapi_key}"},
        "get_client": get_client,
    },
)
print(ds)

JavaScript:  zarrita 

zarrita is a JavaScript toolkit for working with chunked, compressed, n-dimensional arrays in the Zarr archive. It runs natively in the browser, making it possible to stream ARCO data directly to web applications with no intermediate processing layer.

This is the approach used by the Weather Replay application, which renders ERA5 weather data on a 3D globe by reading directly from the Zarr archive on the client side.

Requirements

Install zarrita via npm or yarn etc:

npm install zarrita

Plug and play

The following example is a zarrita get-started guide, however this will not suffice for production web applications. Please see the Advanced Usage below for more robust mechanisms.

Access ERA5 single levels data with zarrita
import * as zarr from "zarrita";

// Your CDS API key, available from https://cds.climate.copernicus.eu/profile
const cdsapiKey = "<CDS-API-KEY>";

// Time-chunked data: optimised for spatial access at a single timestep
const url = "https://arco.datastores.ecmwf.int/cadl-arco-time-002/arco/reanalysis_era5_single_levels/sfc/timeChunked.zarr";

// FetchStore accepts fetch() overrides as a second argument,
// which is how we attach the Authorization header.
const store = new zarr.FetchStore(url, {
  overrides: { headers: { Authorization: `Bearer ${cdsapiKey}` } },
});

// withConsolidated loads all metadata in a single request,
// avoiding one HTTP request per array/group in the hierarchy.
let root = await zarr.withConsolidated(store);
let group = await zarr.open(root, { kind: "group" });

// Open and read the 2m temperature array
let t2m = await zarr.open(group.resolve("/t2m"), { kind: "array" });
t2m.shape; // e.g. [time, latitude, longitude]
t2m.dtype; // e.g. "float32"

// Fetch a single timestep (index 0) across all lat/lon
let { data } = await zarr.get(t2m, [0, null, null]);
// data is a Float32Array of length latitude * longitude

// Read coordinate arrays
let latitudes = await zarr.get(await zarr.open(group.resolve("/latitude"), { kind: "array" }));
let longitudes = await zarr.get(await zarr.open(group.resolve("/longitude"), { kind: "array" }));

Advanced usage

For production web applications, withConsolidated is particularly important: without it, opening each array in the hierarchy requires a separate metadata fetch over the network. Wrapping the store with withConsolidated loads the entire metadata tree in one request, after which open calls resolve locally.

Consolidated metadata avoids per-array network requests
// Consolidated metadata means these do not incur additional network requests
let root = await zarr.withConsolidated(store);
let group = await zarr.open(root, { kind: "group" });
let t2m = await zarr.open(group.resolve("/t2m"), { kind: "array" });
let u10 = await zarr.open(group.resolve("/u10"), { kind: "array" });
let v10 = await zarr.open(group.resolve("/v10"), { kind: "array" });

See the zarrita cookbook for further examples.


This document has been produced in the context of the Copernicus Atmosphere Monitoring Service (CAMS) and the Copernicus Climate Change Service (C3S).

The activities leading to these results have been contracted by the European Centre for Medium-Range Weather Forecasts, operator of  CAMS and C3S on behalf of the European Union ( Delegation Agreement signed on 11/11/2014 and Contribution Agreement signed on 22/07/2021 ). All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose.

The users thereof use the information at their sole risk and liability. For the avoidance of all doubt , the European Commission and the European Centre for Medium - Range Weather Forecasts have no liability in respect of this document, which is merely representing the author's view.

Related articles

  • No labels