During this quarter we focused on the preparation of software components and technology readines needed to start deployment the show cases described in the Statement of Work.
In this document we describe the progress and contributions made to different activities in preparation of the deployment of show cases in EWC.
In order to kick-off the collaboration around data technologies, a workshop was organized between MeteoSwiss and ECMWF at the ECMWF center (Reading).
Participation: Christian Kanesan (MeteoSwiss), Victoria Cherkas (MeteoSwiss) Petra Baumann (MeteoSwiss), Stefan Friedli (MeteoSwiss - remote), Milos Belic (MeteoSwiss - remote), Emanuele Danovaro (ECMWF), Christopher Bradley (ECMWF), Simon Smart (ECMWF), James Hawkes (ECMWF), Tiago Quintino (ECMWF), Sandor Kertesz (ECMWF)
The goals were:
The topics discussed during the workshop cover among others:
numpy/xarray: should we embrace rich functionality of frameworks like xarray or use pure numpy API solutions for compatibility?
Metadata and grid information in the framework.
Evaluate and learn from typical member state operators and grids for data processing.
Dask and task scheduling: How to parallelize the dags.
Performance evaluations: dask, numba, GPU.
Event driven processing.
Conclusions and results of the different activities are summarized in the following sections of the quarterly report.
FDB is an essential component for the pilot project. FDB is a domain-specific object store developed at ECMWF for storing, indexing and retrieving GRIB data.
It will be used in the pilot project in order to retrieve and access data semantically instead the traditional (grib) file based approach, employed still in many operational environments of NMHS.
FDB is implements a field database optimized for HPC data centers with a (lustre) distributed file system and adds a Python frontend to facilitie the data access of meteorological fields in Python.
The following shows an example of how to retrieve a full hypercube of ensemble data for two fields (height and DBZ) from COSMO data:
request = mars.Request( ("HHL", "DBZ"), date=ref_time.strftime("%Y%m%d"), time=ref_time.strftime("%H00"), expver="0001", levelist=tuple(range(1, 82)), number=tuple(range(11)), step=lead_time, levtype=mars.LevType.MODEL_LEVEL, model=mars.Model.COSMO_1E, stream=mars.Stream.ENS_FORECAST, type=mars.Type.ENS_MEMBER, ) ds = model_data.get(request, ref_param_for_grid="HHL") |
An important component of the FDB framework is the FDB remote. This new development of the FDB family allows to serve FDB data to requests that do not have access to the Lustre based deployment of FDB.
The FDB remote is implemented as a set of services, including a catalogue (for querying data) and a store (that serves data requests).
A typical use case that needs the deployment of the FDB remote is the access of ECMWF production data from the EWC environment.
One of the architecture designs of pilot project is based on access to ECMWF production directly from EWC (without the need to setup and maintain dissemination streams). However, EWC is not in the same network as the HPC production of ECMWF. Applications running on EWC can not acccess the main FDB of the HPC center for retrieving data. FDB remote will be setup, such that any application in EWC will be able to request data to the server.
A simplified design of this deployment is shown in the following diagram: