Contributors: Harald Schyberg (MET Norway), Per Dahlgren (MET Norway), Eivind Støylen (MET Norway), Åsmund Bakketun (MET Norway), Yurii Batrak (MET Norway), Jostein Blyverket (MET Norway), Ole Vignes (MET Norway), Trygve Aspelien (MET Norway), Xiaohua Yang (DMI), Carlos Peralta Aros (DMI), Bjarne Amstrup (DMI), Pia Englyst (DMI), Kristian Holte Møller (DMI), Ida L Olsen (DMI), Kristian Pagh Nielsen (DMI), Sebastian Pelt (DMI), Søren Borg Thorsen (DMI), Kasper Tølløse (DMI), Patrick Samuelsson (SMHI), Swapan Malick (SMHI), Jelena Bojarova (SMHI), Bolli Palmason (IMO), David Schönach (FMI), Reima Eresmaa (FMI), Mikael Hasu (FMI)

Issued by: DMI / Xiaohua Yang

Date: 18/03/2025

Ref: C3S2_D361a.1.1.3_FullSystemDocumentation_v3.docx

Official reference number service contract: 2022/C3S2_361a_METNorway/SC1

Introduction

Recent decades have seen great success with atmospheric reanalysis at global and regional scales. Within the product portfolio by the Copernicus Climate Change Service (C3S), global ERA5 reanalysis has proven to be a dataset with superior quality for a broad range of user communities and applications, such as those in climate research and recently in data driven weather forecasting. Meanwhile, the Copernicus regional reanalysis on the European Arctic region developed during Copernicus phase 1, CARRA, has also demonstrated clear added value, thanks to the use of a kilometer scale operational Numerical Weather Prediction (NWP) model, especially with adaptation for climate reanalysis for the Arctic region, a geographic area which has been revealed to play an essential role in the ongoing global climate warming. 

Following the success of the CARRA project during the Copernicus phase 1, the present reanalysis service, C3S contract C3S2 361a, aims to extend significantly the domain coverage of the CARRA1 to the entire pan-Arctic region (Figure 1) with an extended reanalysis period to cover 40 years, while maintaining the high horizontal resolution. For convenience, we use in the rest of this document the abbreviation CARRA2 (Copernicus Arctic Regional ReAnalysis phase 2) to refer to the reanalysis system and also the dataset produced in this project, and CARRA1 for the corresponding parts in phase 1. 
In the construction of the CARRA2 reanalysis system, following a similar practice as in CARRA1 (Yang et al, 2019), the strategy has been to inherit as much as possible the model system and related technical infrastructure from the state-of-the-art operational Arctic NWP setups so as to minimize the need for development and tuning. Meanwhile, extension and adaptation efforts have been focused on desirable improvements relevant for the pan-Arctic reanalysis, mainly on activation of a multi-layer surface scheme and associated surface assimilation, see detailed description in the following section 3.1. In addition, CARRA2 contains further improvements on the surface properties involving glacier and permafrost surfaces in Greenland, Iceland, Svalbard and Russian islands.

In this report, we document the main features of the CARRA2 reanalysis system, with special focus on new features implemented specifically for this reanalysis application. In separate reports, we provide evaluation and conclusions from tests of various features, especially those with meteorological significance, one with focus on test and evaluation during the development phase (Yang et al, 2025) and one on evaluation of the produced data sets so far (Køltzow et al, 2025). We also provide a separate user guidance document describing the output data and their access in the CDS.  
This system documentation is organized in three sections after this introductory section. In Section 2, we describe the main characteristics of the CARRA2 reanalysis system and how it was set up applying the HARMONIE-AROME system and its source baseline code. Section 3 gives an overview of various important additions and extensions to the HARMONIE-AROME baseline which was done to prepare the CARRA2 system. This includes a description of the new surface and soil scheme with surface assimilation, system updates done to optimize the system to fit computational resources available, adaptations done in the upper-air data assimilation scheme and adaptation of the system for various tailored types of input data for reanalysis. Finally, Section 4 presents a summary of the implementation.

For topics with separate documentation such as those on input observation data stream, reanalysis data description, summary of test and evaluation on the CARRA2 production system, and first phase evaluation and verification, summaries and references have been provided for the some of the overlapping subjects, but readers are asked to look at the relevant documents for more details.

Application of HARMONIE-AROME system baseline

The CARRA2 reanalysis system is an upgrade to that of the CARRA1, with the foundation of the main assimilation-forecast system built on the mesoscale Numerical Weather Prediction (NWP) model HARMONIE-AROME 46h1. HARMONIE-AROME, a kilometer-scale Numerical Weather Prediction (NWP) system (Bengtsson et al, 2017, Gleeson et al 2024), has been developed with the primary goal for application in operational weather forecasting. This forecast system is based on the shared ACCORD (A Consortium for COnvection-scale modelling Research and Development, see {+}https://www.umr-cnrm.fr/accord/) forecast code framework, also as part of the IFS (Integrated Forecasting System) code ecosystem shared with the ECMWF operational NWP system and thereby the ERA5 reanalysis system. The HARMONIE-AROME system is the common basis behind the routine regional operational weather forecasting at the Nordic weather services (see for instance Müller et al, 2017a and Yang et al 2018), all of them being contributors to this Arctic reanalysis effort. As such, the system is the state of the art of limited area operational NWP systems with service coverage to much of the European part of the continent and Arctic regions, thus well suited to be the system basis for this pan-Arctic reanalysis.


Figure 1: The CARRA2 pan-Arctic domain.

In preparation of the CARRA1 reanalysis system, various adaptations, extensions and additions have been made on top of the operational HARMONIE-AROME NWP systems to facilitate reanalysis with the need to cover a multi-decadal period (see the full system documentation for CARRA1 by Yang et al, 2019). For CARRA2, most of these modifications have been ported. In addition, efforts have been spent to test and implement some of the new features, among others, to take advantage of the recent developments around the ACCORD/HARMONIE-AROME system. Most important changes of all are related to the introduction of multi-layer SURFEX physics scheme and the associated changes in surface assimilation (SURFEX is the surface module of the NWP system which describes the surface fluxes and the evolution of surface properties, for a general overview of this scheme see Masson et al, 2013). From the infrastructure side, the upgrade includes, for input data, additional observation data coverage for the extended reanalysis period both for the in-situ and remote sensing observations and for physiographic databases (see detailed description from the observation data report, Dahlgren et al, 2024). Also, a further adaptation and enhancement of the monitoring and verification utilities for the reanalysis purpose has been made based on those used for CARRA1 (see detailed description from the system monitoring report, Støylen et al, 2024). 

The CARRA2 reanalysis is produced over a domain with 2880 x 2880 grid points at 2.5 km horizontal grid distance, 65 levels in vertical with topmost model level at 10 hPa and lowest level at ca 12 m. The grid mesh is defined on a polar stereographic projection. As in CARRA1, the dynamics part of the computation is with quadratic truncation1

The CARRA2 reanalysis consists of assimilation and short-range forecasts. Observation data assimilation is performed each day every 3 hours at 00, 03, 06, 09, 12, 15, 18 and 21 UTC. Three-dimensional variational data assimilation (3D-VAR) is performed for upper air, assimilating in-situ and remote sensing observations. A short-range forecast is performed following assimilation, with a lead time of 18h for 00 and 12 UTC, and 3h for other times. A 75-second time stepping is used in forecast step, with hourly output for most output parameters. For surface assimilation, a Simplified Extended Kalman Filter (SEKF) is used to assimilate screen level temperature, humidity and snow cover information. In addition, CARRA2 also utilizes tailored high-resolution data for sea surface temperature and sea ice cover, as well as satellite-retrievals for glacier albedo over glaciated Arctic regions.  

The CARRA2 code system is maintained using the source code revision control tool Git on GitHub. During the preparation phase, the CARRA2 reanalysis repository, the collection of all sources, scripts and utilities, went through gradual evolution in form of alpha-, beta-, release candidate and final releases with extensive testing and evaluation throughout the processes (Yang et al 2025). While the development is carried out using the Git development branch, CARRA2 production uses the final released code versions.

1 In the HARMONIE-AROME spectral model system, while physics related computations are done in grid point space, the dynamic (linear) part of the calculations is carried out in spectral space, where meteorological fields are represented by a sum of wave functions. Use of quadratic grid is a cost-efficient measure to ensure a sufficient number of grid points to represent each wavelength in spectral space while keeping the number of waves constant.


CARRA2 extensions and deviations relative to the HARMONIE-AROME reference system

Surface modelling and assimilation

CARRA2 is based on the HARMONIE 46h1 (Gleeson et al 2024), porting relevant adaptations for the purpose of reanalysis as the CARRA1 (see also Yang et al 2019). In addition, CARRA2 has introduced various modifications including a major update on surface physics scheme and the associated updates in surface assimilation, which are detailed in the following. 

Updates also include the inclusion of the FLAKE scheme (see references in Bengtsson et al, 2017), which introduces more advanced parameterisations for fresh-water lakes than in CARRA1 and handles the physics of and interaction with lakes more realistically. Among other features, the FLAKE scheme includes a parametric representation of temperature profile both in water, lake ice, snow on ice and in bottom sediments.

Multi-layer SURFEX scheme

For CARRA2, as compared to CARRA1, the domain coverage has been extended to a much larger pan-Arctic region, and this is only sparsely covered by the SYNOP observation network. Thus, a good model simulation relies more heavily on the land-surface scheme to simulate processes in a realistic manner given reasonable forcing from the atmosphere. In CARRA1 we used a SURFEX version applying the so-called "force-restore" method with a heavily simplified description of the vertical layering of the soil and snowpack. To achieve improved performance in CARRA2, the multi-layer SURFEX physics, a non-default option introduced in the Harmonie CY46h1 by the HIRLAM research program, has been chosen with the use of two patches over land tile, forest and open land. The option includes the 14-layer diffusion soil scheme (Decharme et al., 2011), the 12-layer explicit snow scheme (Decharme et al., 2016) and the multi-energy balance (MEB) for forest canopy (Boone et al., 2017). Although this setup is still not used operationally by any ACCORD service, it has long been under evaluation within HIRLAM research community, it has e.g. been applied by the HARMONIE-Climate community for several years and has also been used by MET Norway for its pre-operational AROME-Arctic setup with positive experiences. 

The explicit snow scheme simulates for each layer the evolution of heat content, density, liquid water content, snow age and surface albedo. It insulates the soil from the atmosphere which allows for a realistic behavior of the soil energy and ice content during winter. The soil column is divided into 14 layers, with thin layers at the top and thicker layers at deeper levels, where the bottom of the deepest layer reaches 12 m. The temperature and soil moisture/ice at the thin top layers can react quickly to changes in energy and water fluxes while the thick deep layers represent a longer and slower memory of these upper forcing quantities (O(decade)). This long memory requires special care with respect to the combination of initial state and spin-up (see below at Section 3.1.3). However, the deep long memory is still not long enough to represent e.g. permafrost dynamics. 

As in CARRA1, the Town-Energy Balance (TEB) model is used to simulate the urban fractions of the domain. The TEB setup is the same as applied operationally for HARMONIE-AROME. In CARRA1 the surface temperature of lakes (LST) was approximated by the nearby deep soil temperature while in CARRA2 the advanced FLAKE model (Mironov et al., 2010) is applied, which besides LST also simulates e.g. ice and snow cover. As for the deep soil, lakes also represent quite a long energy memory and therefore requires careful initialization.

Surface data assimilation

The activation of the multi-layer SURFEX schemes brings major changes in model description of soil processes, requiring adaptation in surface data assimilation as well. This entails a major deviation of the surface reanalysis of the soil states from the force-restore scheme-based reference system HARMONIE-AROME CY46h1.  In the CARRA2 surface analysis, observation increments of screen level temperature, humidity and snow depth are first distributed horizontally in the 2D grid point space using the traditional Optimal Interpolation (OI) scheme. The vertical distribution of the increments for these screen level properties are ensured by the flow dependent Simplified Extended Kalman Filter (SEKF)

In CARRA2, gridpp, an open source software developed in MET Norway, (https://github.com/metno/gridpp),  is used to perform horizontal Optimal Interpolation (OI) for distribution of increments from screen level observations. The OI scheme here uses a Barnes structure function (Barnes, 1973) with horizontal decorrelation length of 35 km and a vertical decorrelation length of 200 m for screen level temperature and relative humidity. For temperature an adiabatic lapse rate correction is also performed.

After interpolating the point observations to a model grid with OI, the innovations between first guess and the interpolated observations are used to update soil variables in each grid point individually. While OI was used to determine the weights between innovation and increment for the control variables in CARRA1, the update of the soil parameters is done using SEKF for a better handling with the new multi-layer physics. The SEKF allows the weights between innovation and increments to be flow dependent based on the linearized observation operator and removes the need for prescribed relationships.

Following Albergel et al. 2017, soil temperature in layer 1 and 2 (0 to 4 cm), and soil moisture content at model layers 2-6 (2-60 cm) are updated with the SEKF. The analysis is given by

Where  is the background state,  the background error covariance matrix,  the linearized observation operator, h the nonlinear observation operator mapping model space to observation space,  the observation error covariance matrix, and  is the observation vector. The filter assumes a constant background error covariance matrix (Table 1). The linearized observation operator Jacobian matrix  is approximated by the finite difference method with perturbed simulations of the surface model.

Here,  is the model observation equivalent for observation m, and  is a perturbation to the jth control variable. To ensure that the linearization approximation holds, perturbations are performed in positive and negative pairs. The positive and negative perturbations are compared and Jacobian values which satisfy the criterion below are set to zero.

The scheme has a number of tunable parameters defined around soil layer temperature (TG) and wetness (WG) for selected top soil layers. Table 1 lists values selected for these control parameters, with some of them taken from literature, some used specifically for the CARRA2 system.

Table 1: The tunable parameters in the CARRA2 surface analysis with SEKF. WG refers to soil moisture, TG to soil temperature and the trailing numbers indicate the model soil layer, numbered from top to bottom.

Control vector

WG6, WG5, WG4, WG3, WG2, TG2, TG1

Background error

0.2, 0.2, 0.2, 0.25, 0.25, 2.0, 2.0

Observation Vector

T2m, Rh2m

Observation error

1.0, 0.1


The quality control tests for the screen level quantities are performed using the open source software titanlib developed at Met Norway (https://github.com/metno/titanlib/wiki/Titanlib-documentation), which consist of checks about domain, blacklist, "Nometa", redundancy, plausibility and fraction. The Nometa check flags stations with invalid latitude, longitude, and/or elevations. The domain check flags stations outside the model domain. The blacklist checks a predefined blacklist. For screen level temperature and relative humidity an additional spatial consistency check is performed.
The CARRA2 snow analysis is done at 06 UTC. SYNOP snow depths are mostly reported at 06 local time, which means that for the CARRA2 pan-Arctic domain we gather all snow observations for that day in one file and assimilate this at 06 UTC. We also assimilate satellite derived snow cover information from CryoClim as in CARRA1 (see description of this input data set in Dahlgren et al, 2024). The CryoClim data consists of satellite information derived during that day, so at 06 UTC we are potentially assimilating information "forward" in time.

SYNOP snow observations in BUFR format and CryoClim snow cover in netCDF format are converted to "json"-format text files. In the conversion from netCDF to json format the CryoClim data is thinned. We do not use CryoClim observations over permanent snow, so where the grid-cell permanent snow fraction is larger than zero, we drop CryoClim observations. We also do not use CryoClim observations where the grid-cell slope is larger than 0.5 and the land area fraction is smaller than 1.0. These two thinning criteria are based on the expected quality (and need) for a product over such grid-cells.  

Finally, we only select every second CryoClim observation within a neighborhood. This is because selecting all CryoClim observations would make the horizontal Optimal Interpolation (OI) slow. The neighborhood selection is based on checking if 3x3 grid-cells agree with respect to snow or no snow. 

The CryoClim snow/no-snow product is converted to observations that can be used in the horizontal OI. The pseudo-observations are created as follows:

  1. if CryoClim has snow:
    1. if model first guess snow depth > 0.0
      1. if model first guess snow depth <= fg_threshold: obs_value is set to model first guess
      2. else: Pseudo-observation is not created or used
    2. else
      1. obs_value is set to 10 cm
  2. elif CryoClim has no-snow and first guess snow depth > 0.0 :
    1. obs_value is set to 0 cm


Where the first guess threshold (fg_threshold) is 40 cm. In practice this means that when the CryoClim observations say there is snow, but the model is missing snow, we add 10 cm of snow for the horizontal OI. And when the CryoClim product says that there is no snow, but the model has snow we set the observed value to 0 cm for the OI. If both CryoClim and the first guess has snow then we use the first guess value with a maximum of 40 cm.
After the thinning step all snow observations go through a quality control (QC). The QC is done with the titanlib package described in more detail in the beginning of this section. The QC is done separately for SYNOP snow depth and CryoClim.

Table 2: Quality control configuration for snow observations


Nometa

Domain

Blacklist

Redundancy

Fract133ion

Firstguess

Plausibility

SigmaO

SYNOP

Yes

Yes

Yes

Yes

0.2

∓ 50 cm

0-100 m

1.0

CryoClim

Yes

Yes

Yes

Yes

0.2

∓ 50 cm

0-100 m

1.6

Also for snow, the nometa checks flags stations with invalid latitude, longitude, and/or elevations. The domain check flags stations outside the model domain. The blacklist checks a predefined blacklist and flags stations or CryoClim observations based on station ID and/or lon/lat values. In CARRA2, Icelandic SYNOP snow depth stations are added to the blacklist and flagged. This is done to avoid the influence of the surface analysis on the Icelandic glaciers. The Icelandic snow depth observations are located close to the coast and are not representative of the snow further inland. The redundancy check flags redundant observations (same position) and it selects the observation that is closest to the analysis time. The fraction test removes observations where the land area fraction is less than 0.2. CryoClim has already gone through a test for land area fraction equal to 1. If the model first guess departure is above 50 cm the observation is discarded in the First Guess check. The plausibility check removes e.g. negative observations. The ratio of observation to model error (SigmaO) is also set in the QC. SYNOP snow depth weights the observed value and first guess equally, while the CryoClim uses a larger observation error than model background error (so the CryoClim observations get less weight than that of the first guess).  

For snow, the horizontal OI is again performed with the gridPP software as for the screen level information. The OI uses a Barnes structure function (Barnes, 1973) with horizontal decorrelation length set to 30 km and vertical decorrelation length set to 300 m. The OI uses QC-d snow depth observations from SYNOP and CryoClim.  


Figure 2: Left: first guess snow depth from SURFEX, middle: snow depth from horizontal OI, right: analysis increments (Analysis-minus-first guess). Date: 01/03/2023 06 UTC.

The new snow depth from the OI analysis (Figure 2 middle panel) is passed to SODA (SURFEX Offline Data Assimilation), which is part of the CARRA2 code. Differently from CARRA1, the update of the snowpack is done in snow depth space in CARRA2. The model prognostic snow water equivalent and snow density are converted to snow depth (Figure 2, left panel) and a rule-based update is then performed. Example snow depth increments are shown in Figure 2 right panel. To try to filter out unphysical analysis increments we set the snow depth to zero where the first guess snow depth from the model was zero and the soil temperature was larger than 273 K. We do not use the analysis increments where the permanent snow cover fraction is larger than 0.75. This is to avoid removing and adding snow on glaciers. When the snow depth is updated, we use the first guess density from the model to convert to snow water equivalent (prognostic variable) for the next model integration step. 

Initialization for soil properties

As mentioned in Section 3.1.1, the long memory of deep soils and lakes has to be considered as part of the spin up procedure. Approximately, it can be assumed that the annual temperature cycle in the soil does not reach the depth of 10 m, which represents the middle of the deepest soil layer in the 14-layers diffusion soil scheme. Therefore, we can assume that the initial temperature of this deepest layer can be approximated by a multi-year mean temperature of the ERA5 deepest soil layer, i.e. layer four in the IFS TESSEL soil scheme. This multi-year mean temperature has been created in a pre-processing step and is provided to the SURFEX initialization step (PREP), in which the initial conditions for all surface prognostic variables are defined. The multi-year mean represents the period 2003-2007 and to good approximation this value is representative of the initial condition for all streams of the CARRA2 setup. In this way we get a deep soil temperature profile which smoothly approaches the 10 m multi-year climatology, and which therefore reduces the need for a very long spin up, which would otherwise be needed for the deepest soil layers. On the other hand, for soil moisture we do not relate it to a deep climatology since the character of soil moisture is much more dependent on the specific soil model. In other words, it is not relevant to impose a deep soil moisture climatology from TESSEL to a corresponding deep layer in SURFEX. Thus, the soil moisture initial state in PREP relies on the four TESSEL layers only.

One of the advances in the CY46h1-based CARRA2 system is the implementation of the lake scheme FLAKE which provides advanced parameterization for fresh-water lakes. For the initial conditions of prognostic variables in FLAKE we utilize the global lake climatology database which represents initial conditions for all FLAKE prognostic variables as annual cycles for a number of lake depth intervals (Kourzeneva et al., 2012).

Sea ice representation

An overview of the daily input data sets used to define the sea ice concentration and sea surface temperature is given in Section 3.4.3 below (also provided in the CARRA2 input data report, Dahlgren et al, 2024). The modelled representation of sea ice properties in the CARRA2 system has been considerably modified compared to CARRA1 in order to alleviate the shortcomings found in the original reanalysis product outlined in Batrak et al., 2024. Sea ice in the CARRA2 system is modelled by the SICE scheme (Batrak et al, 2018) which has been extended with an external component to take into account some effects of the ice dynamics. 

SICE is a one-dimensional parameterization scheme of ice which does not represent the processes of ice drift during the model forecast, however CARRA2 now uses an external free-drift-based dynamic ice model to correct the sea ice state at the analysis time. This correction is applied every 6 hours (i.e. at every second analysis) and uses the model 10m wind from the previous atmospheric forecasts to compute the ice drift vectors and propagate the ice state accordingly. Ocean currents are not taken into account in the applied free-drift model. Additionally, sea ice concentration is not modified by the dynamic model, which is still, similarly to CARRA1, fully defined by the external input data sets described in Dahlgren et al, 2024. Thus, only prognostic variables of the SICE scheme such as ice thickness, ice temperature and snow state are updated within the dynamic model.

Due to the large size of the CARRA2 model domain, inflow of new sea ice through the model boundaries is of limited importance and therefore no additional provisions were made for initializing the state of inflowing ice from external data sets. Instead, an 8 grid-cell-wide boundary area is introduced around the model domain with the mirrored sea ice state, as a pragmatic and simple procedure. Since a free-drift based model cannot properly represent the spatial distribution of sea ice, it was complemented by an external landfast ice (ice that is immobile due to its attachment to a coast) data set allowing for more realistic sea ice cover. Landfast ice masks are used to define 'false land' areas where ice advection is restricted. The landfast ice masks applied for constraining the sea ice drift differ from the ones included in the sea ice concentration data, owing to insufficient spatial resolution of the latter masks. More details on the data sets used for the masks are provided in section 3.4.5. 

To further increase consistency between the production streams and impose an additional constraint on the sea ice state CARRA2 relaxes the modelled ice thickness towards the values reported by the monthly fields of the ECMWF's ORAS5 ocean reanalysis system (for details on ORAS5, see its CDS entry at https://doi.org/10.24381/cds.67e8eeb7). The relaxation procedure is applied on the 1st, 11th and 21st of each month by means of a simple two-dimensional optimal interpolation (2D-OI) procedure. ORAS5 fields are thinned using 25km thinning radius to avoid uneven density of the native grid of ORAS5. Then the 2D-OI is applied to correct the model ice thickness. Since the same field is used three times in the correction procedure, observation error covariances are increased to avoid over-constraining the system.

The thermodynamic core of the SICE scheme is similar in CARRA1 and CARRA2 with some minor adaptations applied in CARRA2 owing to a more recent HARMONIE-AROME version used. Specifically, snow cover on top of the sea ice is represented with 12 snow layers, the same as for snow over land. The snow albedo scheme, compared to the reference implementation of HARMONIE-AROME, uses modified snow-density/grain-size relations more suitable for snow over sea ice. When computing the grid-cell average turbulent fluxes of heat and momentum between surface and the lowest model level, CARRA2 extends formulations of CARRA1 by applying a stability-dependent form drag parameterization (Lüpkes and Gryanik, 2015). Computation of the grid-cell averaged turbulent flux of moisture is not modified in CARRA2.

Model updates to boost computational efficiency

Multi-decadal reanalysis is a computation-intensive project. With the 30-year CARRA1 reanalysis, the unprecedentedly high resolution of 2.5 km enhances greatly the quality of reanalysis for essential climate variables in comparison to the global reanalysis ERA5, the latter with a much coarser resolution of around 31 km in grid distance. For the CARRA2 reanalysis, one of the critical requirements is to substantially enlarge the aerial coverage from the European Arctic-only CARRA1 to the whole Arctic region, while maintaining comparable quality as achieved in CARRA1. In order to fulfill this goal, the chosen pan-Arctic domain, as shown in Figure 1, has a grid size of 2880x2880 points at 2.5 km in horizontal and 65 levels in vertical, thus a total of 539 million grid points. Compared to the grid size of a total of 142 million grid points in CARRA1 combining CARRA-East and CARRA-West domains, the CARRA2 grid is 2.8 times larger than that of CARRA1. Further, the reanalysis period is 40 years, thus, in terms of computation tasks, CARRA2 is about 4 times larger than that of the CARRA1. On the other hand, the available computational budget for CARRA2 on the assigned HPC platform (ECMWF ATOS) is 2.4 times more than that of CARRA1. Therefore, greater cost efficiency was a necessary step to fit the CARRA2 production into these resources. In CARRA2, various major cost-saving measures have been sought and successfully tested, and this includes cost cutting in by use of hydrostatic option, use of single precision in integration, as well as optimization in recurring cycling configuration.

Hydrostatic mode

From scientific literature, non-hydrostatic effects are perceived to become significant when vertical acceleration becomes non-negligible in comparison to gravity and vertical pressure gradient forces. However, it is unclear as to how far the validity of the hydrostatic approximation holds as a function of model grid scale. When HARMONIE-AROME is used for operational forecasting, the default option is the non-hydrostatic one. In terms of computational cost, the non-hydrostatic option in HARMONIE-AROME is about 20% more costly than the hydrostatic option. This significant impact on computational cost makes it natural to consider either to have less high resolution or more preferably the use of the hydrostatic option. In the separate test & evaluation report (Yang et al 2025), relevant results from our investigation on adequacy of hydrostatic approximation in the present context are summarized. From these experiments, it is concluded that both in terms of verification score intercomparisons and case studies for extreme weather events, no differences of significance have been identified between the runs made with either hydrostatic or non-hydrostatic dynamics. It may be argued that these results do not exclude exceptional cases in which such effects may be significant, but it is considered that in terms of climate trends and climate statistics, an occasional sensitivity to the hydrostatic approximation will unlikely have significant impact. Based on this reasoning, the hydrostatic option has been selected in the configuration of the CARRA2 system. 

Efficiency enhancement with hybrid precision and hybrid-parallelization

Significant performance gain has been achieved by implementing single precision for the forecast part of the HARMONIE Cy46h1, which brings a speed up of around 30 to 40% on the ECMWF ATOS HPC. A further gain of around 10% is achieved with the use of hybrid parallelization which combines OpenMP with MPI (Message Passing Interfacing). In the previous versions of HARMONIE-AROME, MPI is used for cross-node parallelization. For CARRA2, same-node shared memory OpenMP parallelization has been successfully implemented, while cross-node communication relies on MPI parallelization. It is found that on the ECMWF ATOS using 4 OpenMP threads per node appears to be optimal in the forecast step, which provides a speed-up of around 10%.

Reduction in forecast lead-time

Another major cost cutting (relative to CARRA1) is to limit the forecast lead time for the two long forecast steps per day for 00 and 12 Z from 30h (in CARRA1) to 18h. This is in view of the experiences seen from the user demand for CARRA1 reanalysis dataset, where the data with lead-time beyond 18h is seldomly used. For accumulated properties such as 24h accumulated precipitation, it can be obtained by combining forecasts from several base times so that the forecast with maximum lead time of 18h is sufficient to be used. This optimization in cycling configuration helps to speed up reanalysis cycling and also reduces quite substantially the data amount, hence storage needs.

Adaptations in the upper air data assimilation algorithm 


For data assimilation, the Arctic region is a particularly challenging area with a rather sparse surface observation network. As in CARRA1, the large-scale mixing option, "LSMIX", which explicitly uses large scale information from ERA5 through spectral nudging in the background term, is used in the upper air 3D-VAR assimilation. The BRAND method, as implemented in CARRA1 for derivation of background error structure functions, has also been inherited. In addition, also with the simple time interpolation to use daily varying structure functions by interpolating between structure functions for summer and winter, so as to take into account the seasonal changes with the dominant flow patterns. More details on the LSMIX, BRAND and time interpolation methods are provided in Section 2.2.2 of the CARRA1 System Documentation (https://confluence.ecmwf.int/display/CKB/Copernicus+Arctic+Regional+Reanalysis+%28CARRA%29%3A+Full+system+documentation#CopernicusArcticRegionalReanalysis(CARRA):Fullsystemdocumentation-Adaptationandimprovementindataassimilationalgorithm). In tuning the relative weight between background and observation errors in the cost function formulation, drawing on experiences from CARRA1, a lighter weight than in the initial CARRA2 system experiments was assigned to the background error, with "scaling factor" (as defined in Lindskog et al 2001) selected to be 0.6 in 3D-VAR. This has been found to result in better dynamically balanced model states of the short-range forecasts. Such tuning appears to benefit particularly some upper air fields, whereas the observation fit at analysis time for some of the large scale parameters such as Mean Sea Level Pressure tend to suffer slight degradation. This way of relying on the balance relations in the structure functions rather than optimizing analysis fit to observations is expected to improve the reanalysis over the data sparse areas of the large pan-Arctic domain.

The Huber-norm variational quality control is activated in CARRA2 with the goal to reduce the risk of erroneous rejection of good observational data in assimilation of rapidly evolving synoptic weather situations. This was already used in the ECMWF IFS model for a long time, but not in HARMONIE-AROME. This is particularly relevant in areas with sparse observation coverage, as we have in parts of our large pan-Arctic domain. 

Variational quality control with the Huber norm approach 

Observation quality control is a critical component of a variational data assimilation system. Observational data includes measurement errors, representativeness errors, and occasionally gross errors. By identifying and mitigating errors, the quality control of observations improves the accuracy and consistency of the data assimilation, ultimately enhancing the performance of the system. Earlier versions of the variational quality control method accounted for the probability of gross errors (PGE) with a flat probability density function within the 3D-VAR data analysis, while random errors were modeled using a Gaussian distribution (Andersson and Järvinen 1999; Lindskog 2001). Some observations do contain gross errors that are accurately represented by a flat distribution; however, this distribution often poorly captures the nature of outliers. The Huber norm distribution has been found to be highly effective in describing most innovation statistics, once systematically erroneous observations are removed (Tavolato and Isaksen, 2015). As a robust method, the Huber norm allows for the safer inclusion of outlier observations in the analysis step, which enables a relaxation of the background quality control (which is based on the difference between the first guess and the observations). Multiple case studies illustrate that the Huber norm quality control (HN-QC) method greatly enhances the analysis and forecasting of extreme events. These examples underscore the robustness of the Huber norm approach, as it allows the analysis to make effective use of outlier observations, particularly when several observations show consistent and significant deviations from the model background. In addition, HN-QC is a robust method that allows the use of observations with large departures (Tavolato and Isaksen, 2015). This Huber norm distribution is characterized by a Gaussian distribution at its center and an exponential distribution in the tails. 


The observation cost function Jo for one datum is defined as 


where

 

and  

 
P(x) and ρ(x) define the Huber norm distribution, the transition point, denoted by c, marks where the Gaussian part of the distribution ends and the exponential part begins. In the JoQC Huber norm distribution, an L2​ norm applies to the center of the distribution, while an L1 norm applies to the tails. The HN-QC method accommodates observations with large deviations. Finally, the weight applied to an observation as the ratio between the applied observation cost function JoQC and JoGaussian (pure Gaussian). The Huber norm distribution provides a robust estimation that places considerable weight on outliers. To ensure the highest quality observations across the CARRA2 domain, the HN-QC has been successfully implemented in 3D-VAR assimilation system for in-situ observations, including measurements of wind, temperature, geopotential and surface pressure.

Seasonally varying structure function

Effectively characterizing the forecast error structure of a model presents a significant challenge using variational data assimilation within a high resolution, limited-area model system. In a variational data assimilation approach, one of the primary challenges is to develop background error statistics, also known as forecast error structures or structure functions. To estimate background error statistics (B-matrix), we used the BRAND (B-matrix RANDom) method, originally developed in CARRA1 (see description in Yang et al, 2019). This method introduces field perturbations by sampling uncertainties directly in the control vector space, perturbing the full range of scales. A two-step iterative procedure was applied to generate the B-matrix, representing error growth over short time scales. First, a time series of two sets of 6-hour forecast files were generated using the CARRA2 system at ensemble downscaling mode, using ERA5 EPS as initial and boundary conditions. The two periods cover a winter month (January 2022) and a summer month (June 2022), respectively. For both seasons, multiple ensemble forecasts with 10 members were produced, using the standard CARRA2 spatial resolution of 2.5 km and 65 vertical levels. With the provisional structure functions derived from these ensemble runs, the month-long EPS runs are repeated with 3D-VAR cycling using BRAND perturbation, with the aim to derive final structure functions. Figure 6 displays modelling for the lowest model level temperature by the control and ensemble members for 6-hour forecasts valid on January 20, 2022, at 06 UTC. For each of these 10 ensemble members, the model was initialized using the 3D-VAR data assimilation method and a 6-hour forecast, assimilating only conventional observations such as SYNOP, SHIP BUOY, and AIRCRAFT data. All available conventional observations were included within a +/- 3-hour window for each assimilation cycle. Background perturbation was applied to nine ensemble members, with one control member unperturbed. In the second round of CARRA2 EPS runs, a total of one month forecast files were initialized at the assimilation cycles (00, 06, 12, and 18 UTC) each day, and these files were used to calculate error statistics. Finally, two separate B-matrices, one for winter and another for summer, were calculated and stored. The B-matrices are generated using the stand-alone Forecast Error Statistics (FESTAT) method (Berre, 2000). The control variables include vorticity, divergence, specific humidity, surface pressure, and temperature. FESTAT handles various tasks related to error statistics, such as converting variables from gridpoint space to spectral space, computing the balance operator, generating horizontal variance density spectra for control variables, and determining vertical correlations for these variables (Bojarova and Gustafsson, 2019).


Figure 6: Temperature at level 65 (in Kelvin): (a.) absolute field for the control member, and (b.-j.) temperature differences at level 65 (in Kelvin) between the control (CNTL) and each ensemble member for 6-hour forecasts valid on January 20, 2022, at 0600 UTC.

Previous studies observed that the estimated background error statistics for the HARMONIE-AROME exhibit strong seasonal variability. The importance of seasonal variation in the B-matrix for CARRA2 is highlighted in Figure 7. The ratio of specific humidity in the mid-troposphere for wave numbers 25 to 60 is significantly higher for the summer B-matrix. In contrast, this ratio is much lower and closer to zero for the winter B-matrix at the mid-tropospheric level, where cloud information is more prominent. This variability is linked to differences in prevailing motion scales and dominant air masses, and thus in weather regimes across different seasons. To address this seasonal variability, CARRA2 incorporates a gradual seasonal adjustment of climatological structure functions. This adjustment is implemented through a weighting function that calculates background error statistics, which vary monthly by interpolating between structure functions for winter and summer months.  


Figure 7: The percentage of the variance of specific humidity (q) that is explained by unbalanced temperature (Tu) as a function of height and wave number for (a.) winter B-matrix and (b.) summer B-Matrix.

The observation system evolves with time, notably in availability of satellite observations. From CARRA1 experience, changes in forecast error covariances between difference decades appear to be less significant in comparison to those between seasons. We have thus continued the CARRA1 practices to use the same B-matrix description and structures for all years in the reanalysis (so there is only monthly dependence for each year).

Improved and extended use of input data

A detailed description of all input data sets for data assimilation, SST, sea ice, physiography and glacier albedo can be found in the input data description by Dahlgren et al, 2024. Some of the information in that report is repeated here, and we also provide further information on various aspects and the usage of the input data.

Physiographic data handling

The physiography input in HARMONIE-AROME Cycle 46 used in CARRA2 includes significant updates and improvements compared to previous versions, particularly with the introduction of the ECOCLIMAP Second Generation (ECO-SG) database. These updates provide a more accurate representation of the surface properties required by the HARMONIE-AROME surface scheme. The various physiographic input data were summarized in the input data inventory in Dahlgren et al, 2024. We repeat information on the physiographic data sets here, with added information on the adaptations done to them, along with motivation for the additional efforts carried out.  

Land cover characteristics
The land cover characteristics are described using the ECO-SG land cover map developed by Météo-France, which has a primary resolution of approximately 300 m. This represents a significant improvement over HARMONIE-AROME Cycle 40 used in CARRA1, which utilized the 1-km resolution ECOCLIMAPv.2.2 dataset.
While ECO-SG generally provides improved representation of surface properties with its higher resolution, additionally several issues and inaccuracies were identified in remote Arctic areas. As such, further corrections and improvements have been implemented in the following aspects.

Glacier updates
The glacier extents in ECO-SG have been updated using the Randolph Glacier Inventory (RGI) to provide a better representation of glaciers in the whole Arctic region which were overestimated in the original ECO-SG. The RGI data aims to represent glacier extents as close as possible to the year 2000 (which corresponds well to the middle of the reanalysis period) and are kept fixed throughout the whole reanalysis period.

Regional land cover updates
Specific land cover updates to the ECO-SG have been implemented for several high-latitude regions:


Orography
For topography representation, HARMONIE-AROME Cycle 46 uses an improved digital elevation model (DEM) at 150 m resolution based on the Copernicus GLO-90 dataset. This dataset was upscaled from its original 90 m resolution to make file sizes more manageable. Quite a few errors identified in this database were corrected using the ArcticDEM 90 m mosaic dataset. We included a lot of updates to GMTED2010 over Greenland and Iceland in CARRA1, but the additional updates done in CARRA2 for the whole pan-Arctic region are significant. This represents an improvement over the GMTED2010 dataset used in previous versions.

Soil properties
The soil properties rely on the SoilGrids database, which has a primary resolution of approximately 300 m. This database provides more accurate and higher-resolution soil data compared to the previously used Harmonized World Soil Database (HWSD).

Lake representation
For lake depths the Global Lake Database v.3 is used, with a primary resolution of approximately 1 km. Most lakes in the high Arctic region have unknown lake depth, so a default value of 10 m is used.

Glacier snow initialization, glacier albedo scheme

The glaciated surfaces are initialized with 9990 kg/m² snow water equivalent (a very large, somewhat arbitrary value to make the surface act as a snow reservoir) each year on 1st of September. This is a simplification, since we do not have a scheme for glacier ice integrated in the surface scheme, but instead over the glaciers we rely on the snow scheme which is part of the multi-layer surface scheme.

Glacial ice often contains impurities that strongly reduce its albedo. Algae-growth can further affect the glacier albedo. This is very difficult to model correctly, but in a reanalysis, it can be prescribed from satellite data.
As also described in the input data inventory (Dahlgren et al, 2024), we use satellite-derived glacier albedo data from three sources:


The Geological Survey of Denmark and Greenland (GEUS) provides gap-filled and denoised albedo products from these three sources (Box et al. 2017). This has been tested against in situ measurements on the Greenland ice sheet and found to have a root mean square difference of 0.08 for stations in the ablation zone, and 0.05 for stations in the ice sheet accumulation area (Box et al. 2017). 

Using daily varying broadband albedo values for the full reanalysis time series in this way gives a more realistic and consistent data set for the radiative exchange scheme than relying on fixed climatological fields for the whole or parts of the period. Albeit from different sensors and satellites, all these three sources provide similar information. Kohkanovsky et al. (2020) demonstrated good consistency between albedos derived from MODIS and from Sentinel-3 when compared to albedos derived from surface stations. The AVHRR data has 5 km resolution compared to the 500 m resolution of the two newer satellite products. 

We use broadband albedo data for all grid boxes that include glaciers, as was done in CARRA1. In CARRA2, we modify the UV-VIS spectral albedo of the model radiation scheme to fit the broadband satellite-derived albedo. The satellite-derived albedos are not used if modelled fresh snow less than 3 days old is present. This is done since the satellites cannot see the surface during snow events due to the clouds. The external satellite albedo data are used in the summer season only, March - October (September when AVHRR is used), as the sun is lower and longer below horizon in the winter season and the impact of albedo differences is then very small.

Sea ice concentration and sea surface temperature

The selection and merging process for sea ice and sea surface temperature data input was described in the input data report by Dahlgren et al (2024) and is also repeated here.

The sea ice concentration (SIC) field in CARRA2 is a merged product consisting of ESA CCI and OSI SAF Sea Ice concentration Climate Data Records (CDRs) along with sea ice chart input from FMI/SMHI from Baltic sea and landfast ice charts. Landfast ice, which is the sea ice that is immobile due to its attachment to a coast, is included from the U.S. National Ice Center (NIC) weekly or biweekly ice-chart time series. The SIC CDRs are used in a prioritized order with OSI-458 as first priority, SICCI-HR-SIC as second priority and OSI-450-a as third priority. Prioritization is based on the resolution of the input data, along with the processing applied. Figure 8 shows an overview of the input sources. 
CARRA2 Sea Surface temperature (SST) fields are mainly based on ESA SST CCI Level 4 Analysis Climate Data Record (CDR) for the period 1982-2021 and an integrated climate data record (iCDR v3) after 2021 (see grey lines in Figure 8). The Copernicus Marine Service (CMEMS) regional Baltic Sea - Sea Surface Temperature Reprocessed analysis (produced by DMI) is used in the North Sea and Baltic Sea. A more substantial view of the input data sources is available from section 1.4.1 in Dahlgren et al, 2024.



Figure 8: The top plot illustrates the temporal coverage of the SIC and SST data as they are combined and included to generate the SIC and SST fields for CARRA2. The maps provide examples of the spatial coverage of the different products that are combined for 2019-05-30 for SIC and SST, respectively. The colors in the maps correspond to the colors/products listed in the top plot.

All input data was initially re-gridded to a common 0.05-degree regular latitude-longitude grid (as used by the global SST products) by using bilinear interpolation for SIC CDRs and 2D nearest neighbor interpolation for the ice chart information. Subsequently, the input data was combined as shown in Figure 8. The bar plot illustrates how the data is temporally combined while the maps illustrate how the data is spatially combined for one day. The SIC CDRs were afterwards filtered to increase the accuracy and consistency of the final SIC used in CARRA2.  

The following 4 filters were applied in the order below and the SIC was set to zero if any of the conditions were true:

Filter 1: The grid cell was less than 25 km from the coast and the minimum sea ice concentration within a 12.5 km radius is zero.
Filter 2: The grid cell was less than 25 km from the coast and the SST value is above 3 degrees Celsius.
Filter 3: The SST value was above 8 degrees Celsius.
Filter 4: The two subsequent sea ice charts, which were closest in time, had recorded a sea ice concentration of zero for grid cells that were less than 75 km from the coast.

As an additional filter, an inter-sensor bias adjustment for OSI-450-a and SICCI-HR-SIC was applied after extrapolation using OSI-458 as reference. This was done to adjust for the ice spillover effect, which causes the extent of the coarser resolution SIC products (especially OSI-450-a) to have a smeared edge, effectively placing the ice edge further out than that of the higher resolution OSI-458 data. The adjustment uses a smoothed spline function fitted to five years of overlapping data from 2003 to 2007, with the monthly mean SIC difference between OSI-450-a/SICCI-HR-SIC and OSI-458 computed for bins with 5% increment.  

In 1986 and 1987 extensive data gaps were found for OSI-450-a. In 1986 data was lacking from the 28th of March until the 24th of June and in 1987 data was lacking from the 3rd of December to the 12th of January 1988. Apart from these large gaps several smaller data gaps were present in the datasets, in particular in the OSI-450-a CDR where data is only available every second day from 1982-1987. For the smaller gaps a temporal interpolation was applied where the temporally closest sea ice concentration products were weighted by their temporal distance. For the longer gaps in 1986 and 1987-1988 another method, described below, is used due to the lengths of these gaps, which causes the linear interpolation method to be insufficient.  

For 1986 it was found that the SIC CDR version 4 from NOAA/NSIDC has partial data coverage for the period with missing data for OSI-450-a. Therefore, SIC input from NOAA/NSIDC was used in this period, where missing areas were filled with SIC data that was closest in time. To provide a smooth transition between OSI-450-a and NOAA/NSIDC a temporally weighted linear blend of the two was used for the first and last three days of the gap. Due to the limited data availability for this period, it was, furthermore, decided to apply sea ice chart filtering for the entire area instead of only in near coastal areas (applying "filter 4" in the above list, without the constraint on less than 75 km from coast). For December 1987 no data was available for NOAA/NSIDC or OSI-450-a SIC CDRs. Therefore, the linear interpolation method was applied along with the sea ice chart filtering for the entire area, which ensures that the sea ice extent does not become unrealistically large. An overview of the data gaps is shown in Table 3. 

Table 3: Overview of larger data gaps in OSI-450-a, along with a description of how these were filled in CARRA2. The data gaps span 1986/03/29 - 1986/06/23 and 1987/12/03-1988/01/12, with all dates included. To ensure a smooth transition between the OSI-450-a and the NOAA/NSIDC data record in 1986 three days in the beginning and end of the data gap is a temporally weighted blend of the closest available NOAA/NSIDC and OSI-450-a SIC.

Date

Products used and method applied

1986/03/29- 1986/03/31

Temporally weighted blend of OSI-450-a data from 1986/03/28 and NOAA/NSIDC data from the respective day. Sea ice chart filtering is applied everywhere north of 50 degrees

1986/04/01- 1986/06/20

NOAA/NSIDC data with missing areas filled in with the temporally closest available data. Sea ice chart filtering is applied everywhere north of 50 degrees

1986/06/21- 1986/06/23

Temporally weighted blend of OSI-450-a data from 1986/06/24 and NOAA/NSIDC data from the respective day. Sea ice chart filtering is applied everywhere north of 50 degrees

1987/12/03-1988/01/12

Temporally weighted OSI-450-a data from the 1987/12/02 and the 1988/01/13. Sea ice chart filtering is applied everywhere north of 50 degrees

After filtering, the data is extrapolated into fjords, lakes and near coastal areas, which has not already been filled with data from the ice-charts. An iterative extrapolation method is used where each gridpoint with a null value is assigned the mean value of the 9 neighboring grid cells. A land mask is applied between each step to ensure that values are not extrapolated over land. 
After filtering we obtain a higher consistency between the SIC products. As shown in Figure 9 we observe similar trends in sea ice extent (SIE) between OSI-450-a, NOAA/NSIDC and CARRA2 with CARRA2 having a trend in between those of OSI-450-a and NOAA/NSIDC. Furthermore, we see a very close proximity between the actual sea ice extent values of CARRA2 and OSI-458.

Figure 9: Monthly mean sea ice extent from 1982-2023 of the CARRA2 (denoted as DMI-MSC-SIC), OSI-450-a, NOAA/NSIDC and OSI-458 sea ice products for March (continuous lines) and September (dashed lines), respectively.

Landfast sea ice

As outlined in Section 3.1.4 above, the sea ice modeling has been extended to make use of landfast sea ice masks. These binary landfast ice masks on the native CARRA2 model grid were compiled from the ice charts issued by multiple national ice services (U.S. National Ice Center, Canadian Ice Service, Arctic and Antarctic Research Institute, Norwegian Ice Service, Swedish Ice Service and the Finnish Meteorological Institute's Ice Service). An overview of the utilized ice charts is provided in Table 4. The National Ice Center ice charts form the baseline for landfast ice masks, which is refined using ice charts from other sources. Whenever available, ice charts in a vector graphics format are utilized. However, prior to 2003, the baseline data set was available only in gridded form with 25 km spatial resolution. To allow for a seamless transition between gridded and vector ice charts and improve the effective spatial resolution of the landfast ice masks between 1985 and 2003 an additional manual correction was applied. This correction included visual comparisons against 4 km AVHRR imagery using visible and infrared channel data, and corrections based on the subjective analysis of daily aggregated AVHRR imagery. For most of the pre-2003 ice charts manual corrections were performed on a bi-weekly basis, and then corrected charts were interpolated to produce a weekly dataset. In general, landfast ice masks in CARRA2 have weekly temporal resolution and source ice charts are at least bi-weekly. To further increase the consistency within the landfast ice data set, information about smaller-scale fast ice fields surrounding many islands in the Arctic Ocean and present in modern vector ice charts (but often missing from the older ones) is omitted. 

Table 4: Overview of the different sea ice charts used for producing landfast ice masks

Ice Service

Period

Region

Reference or data access URL

NIC

1985-2003

pan-Arctica

https://doi.org/10.7265/N5X34VDB

NIC

from 2003

pan-Arctic

https://doi.org/10.7265/4b7s-rn93

CIS

from 2006

Canadian Arctic¹

https://doi.org/10.7265/N51V5BW9

AARI

from 2004

Russian Arctic;

Eastern Greenland¹

Arctic and Antarctic Research Institute detailed ice charts for the Eurasian Arctic seas and  seas with seasonal ice cover in the WMO SIGRID-3 format for 2004-2024. AARI WDC Sea Ice. URL: http://wdc.aari.ru/datasets/d0004

(accessed 2024-11-04)

SMHI/FMI

from 1985

Baltic Seab

https://doi.org/10.48670/moi-00131

MET Norway

1985-2003

Svalbard²

Internal data archive

a Gridded ice charts, 25 km resolution

b Gridded ice charts, 1km resolution

¹ Applied when NIC ice charts are not available

² Used instead of NIC ice charts over the Svalbard archipelago

Greenhouse gases, volcanic aerosols and solar irradiance

A brief overview of the input data sets for greenhouse gases and aerosols is also given in the CARRA2 input data overview by Dahlgren et al, 2024. Yearly greenhouse gas data are used with linearly interpolated values for each month. We base these data on CMIP6 historical greenhouse gas data and the SSP5-8.5 scenario. The Northern Hemisphere asymmetry in the CH4 data was included by multiplying these values with 1.05. This factor is based on our own comparisons with measured data from Summit in Greenland and in Northern Europe. The historical greenhouse gas data covers the period from 1850 to 2014, while the scenario data starts from 2015.

For the period 1991-1995 we include stratospheric sulphate aerosols from the Pinatubo eruption based on Toohey et al. (2016)

We use an average solar direct irradiance at the top of the atmosphere of 1361 Wm2 that is consistent with the measurements by the SORCE TIM instrument (Coddington et al. 2016). The solar cycle is included following CMIP5 and CMIP6 recommendations, where the observed solar direct irradiance over the 13-year period 1996-2008 is repeated into the future with a 13-year cycle.

Summary and outlook

The CARRA2 reanalysis system has been developed for the purpose of performing regional reanalysis covering the entire pan-Arctic region. The CARRA2 datasets will cover a multi-decadal period of over 40 years between 1985 and 2025.  

The CARRA2 reanalysis is a continuous assimilation-forecast cycling over a 2880x2880 grid-mesh with 65 vertical layers, using the hourly C3S global reanalysis (ERA5) as lateral boundaries. It produces gridded analysis for selected atmospheric states every 3 hours, and short-range forecasts every 3h up to 18h each day at 00 and 12 UTC, and 3h for other base times. Each forecast has hourly temporal resolution.  
The system baseline of the CARRA2 reanalysis is the state of art kilometer-scale operational weather forecast model HARMONIE-AROME 46h1, which has been developed with high latitude weather and climate as one of its targeted application scenarios. In addition, necessary extensions and enhancements have been introduced for reanalysis purposes. Further, extensive efforts have been made to implement some of the advanced features into the forecast and assimilation system, taking advantage of the developments by the HARMONIE-AROME research and operational communities over the last 5 years which are characterized by the evolution of the baseline code version for the two reanalysis systems from CY40h1 (for CARRA1) and CY46h1 (for CARRA2), see more details in Gleeson et al (2024).

For a reanalysis over pan-Arctic regions which is characterized by a rather sparse network with in-situ observations, the quality of model data relies heavily on the land-surface scheme to simulate near-surface processes. In CARRA2, much of the focus has been devoted to introducing the multi-layer SURFEX physics scheme with 14-layer diffusive soil scheme, 12-layer explicit snow scheme and the multi-energy balance (MEB) for forest canopy, which is an emerging development in the HARMONIE-AROME communities. Accordingly, surface assimilation based on SEKF (Simplified External Kalman Filter) technique has been adapted for update of surface and soil properties using screen level temperature, humidity and snow observations.  

For data assimilation, the pan-Arctic region is a particularly challenging area with a rather sparse surface observation network. For 3D-VAR data upper air assimilation in CARRA2, as in CARRA1, we use a "large scale mixing" "option which explicitly includes large scale information from ERA5. The BRAND method as applied in CARRA1 for derivation of background error structure functions has also been inherited. In order to improve observation intake especially for rapidly evolving weather situations, variational quality control with Huber-norm has been activated, with a goal to improve on flow dependent aspects of the data assimilation in CARRA2, reducing risk of rejection of good observations due to phase errors. 

For multi-decadal reanalysis projects, computation cost is a major limiting factor. After careful investigation, significant efficiency gain is obtained by the adaptation of hydrostatic approximation, use of hybrid (single and double) precision, use of hybrid parallelization and optimization in configuration of assimilation-forecast cycling.  

The reanalysis service also necessitates collection, processing and quality assurance of input data covering the whole Arctic regions over the entire reanalysis period. Examples of such include databases for surface orography and physiographic features, the reprocessed remote sensing data, and non-GTS local surface observations. In particular, collection of substantial amounts of non-GTS surface observation data in the extremely data sparse Arctic region, especially the use of surface in-situ observation and satellite derived glacier albedo information over permafrost regions ensures significant added value of this reanalysis dataset, as usually such data types are not available for real time use in the operational NWP. Such efforts enhance greatly the usefulness of the surface influenced variables of the data set for climate applications. 
The initial CARRA2 service is scheduled to last 4 years from September 2022 to August 2026. The first 18 months was a preparation phase, during which the reanalysis system was built up, with input data prepared. Throughout the CARRA2 system preparation, extensive quality assurance activities were carried out, ensuring the system to deliver the expected performance from both a technical and meteorological point of view. This is documented in a companion Test and Evaluation report (Yang et al, 2025, deliverable D361a.1.1.2). This was followed by the production phase, in which assimilation forecasts are cycled for multiple assimilation streams at the ECMWF High Performance Computation Facility (HPCF), with final results post-processed to feed into the ECMWF MARS archive and subsequently to the Climate Data Store (CDS) for public access.

References

Albergel, C., Munier, S., Leroux, D. J., Dewaele, H., Fairbairn, D., Barbu, A. L., Gelati, E., Dorigo, W., Faroux, S., Meurey, C., Le Moigne, P., Decharme, B., Mahfouf, J.-F., & Calvet, J.-C. (2017). Sequential assimilation of satellite-derived vegetation and soil moisture products using SURFEX_v8.0: LDAS-Monde assessment over the Euro-Mediterranean area. Geosci. Model Dev. Discuss., 2017, 1–53. https://doi.org/10.5194/gmd-2017-121

Andersson E., and Järvinen, H. (1999). Variational quality control. Quart. J. Roy. Meteor. Soc., 125:697-722. https://doi.org/10.1002/qj.49712555416 

Barnes, S. L. (n.d.) (1973). Mesoscale objective map analysis using weighted time-series observations. NOAA technical memorandum ERL NSSL ; 62.  Retrieved February 26, 2025, from https://repository.library.noaa.gov/view/noaa/17647 .

Batrak, Y., Kourzeneva, E., and Homleid, M. (2018). Implementation of a simple thermodynamic sea ice scheme, SICE version 1.0-38h1, within the ALADIN–HIRLAM numerical weather prediction system version 38h1, (2018). Geosci. Model Dev., 11, 3347–3368, https://doi.org/10.5194/gmd-11-3347-2018.

Batrak, Y., Cheng, B., and Kallio-Myers, V. (2024). Sea ice cover in the Copernicus Arctic Regional Reanalysis. The Cryosphere, 18, 1157–1183, https://doi.org/10.5194/tc-18-1157-2024

Bengtsson, L., Andrae, U., Aspelien, T., Batrak, Y., Calvo, J., de Rooy, W., Gleeson, E., Hansen-Sass, B., Homleid, M., Hortal, M., Ivarsson, K., Lenderink, G., Niemelä, S., Nielsen, K. P., Onvlee, J., Rontu, L., Samuelsson, P., Muñoz, D. S., Subias, A., Tijm, S., Toll, V., Yang, X., & Køltzow, M. Ø. (2017). The HARMONIE–AROME Model Configuration in the ALADIN–HIRLAM NWP System. Monthly Weather Review, 145(5), 1919-1935.  https://journals.ametsoc.org/view/journals/mwre/145/5/mwr-d-16-0417.1.xml

Berre, L., (2000). Estimation of synoptic and mesoscale forecast error covariances in a limited-area model. Mon. Wea. Rev., 128, 644–667. https://doi.org/10.1175/1520-0493(2000)128<0644:EOSAMF>2.0.CO;2

Bojarova, J., and N. Gustafsson (2019). Relevance of climatological background error statistics for mesoscale data assimilation. Tellus, 71A, 1–22. https://doi.org/10.1080/16000870.2019.1615168

Boone, A., Samuelsson, P., Gollvik, S., Napoly, A., Jarlan, L., Brun, E. and Decharme, B., (2017): The interactions between soil–biosphere–atmosphere land surface model with a multi-energy balance (ISBA-MEB) option in SURFEXv8–Part 1: Model description. Geoscientific Model Development, 10(2), pp.843-872. https://doi.org/10.5194/gmd-10-843-2017

Coddington, O., Lean, J. L., Pilewskie, P., Snow, M., and Lindholm, D. (2016). A solar irradiance climate data record. Bull. Amer. Meteorol. Soc. 97 (7): 1265-1282. https://doi.org/10.1175/BAMS-D-14-00265.1

Dahlgren, Per et al (2024). CARRA2 Inventory and final description of all input data for data assimilation and for SST, sea ice, physiography and glacier albedo. Copernicus C3S2 Deliverable report D361a.2.1.2, 18/9-2024. https://docs.google.com/document/d/1AMQm13bbRv7FodNHxiNB7I4XVGh-tYYn/edit?usp=sharing&ouid=116822123028632114816&rtpof=true&sd=true

Decharme, B., Boone, A., Delire, C. and Noilhan, J., (2011). Local evaluation of the Interaction between Soil Biosphere Atmosphere soil multilayer diffusion scheme using four pedotransfer functions. Journal of Geophysical Research: Atmospheres, 116(D20). https://doi.org/10.1029/2011JD016002

Decharme, B., Brun, E., Boone, A., Delire, C., Le Moigne, P. and Morin, S. (2016). Impacts of snow and organic soils parameterization on northern Eurasian soil temperature profiles simulated by the ISBA land surface model. The Cryosphere, 10(2), pp.853-877. https://doi.org/10.5194/tc-10-853-2016

Gleeson, E. et al (2024). The Cycle 46 Configuration of the HARMONIE-AROME Forecast Model. Meteorology, 2024 3(4), 354-390. https://doi.org/10.3390/meteorology3040018

Kourzeneva, E., Martin, E., Batrak, Y. and Moigne, P. (2012):. Climate data for parameterisation of lakes in Numerical Weather Prediction models. Tellus A - Dynamic Meteorology and Oceanography, 64(1), p.17226. https://doi.org/10.3402/tellusa.v64i0.17226

Køltzow et al (2025). Evaluation of CARRA1 and CARRA2 data on CARRA1 domains. Copernicus C3S deliverable report D361a.4.1.1, in preparation.

Lindskog, Magnus et al. (2001). Three-dimensional variational data assimilation for a limited area model : Part II: Observation handling and assimilation experiments. Tellus A - Dynamic Meteorology and Oceanography, 53(4), 447–468. https://doi.org/10.3402/tellusa.v53i4.14578

Lüpkes, C., and V. M. Gryanik (2015). A stability-dependent parametrization of transfer coefficients for momentum and heat over polar sea ice to be used in climate models, J. Geophys. Res. Atmos., 120, 552–581, https://doi.org/10.1002/2014JD022418

Masson, Valéry, Le Moigne, Patrick, Martin, Eric, Faroux, Stéphanie, Alias, A, Alkama, Ramdane, Belamari, Sophie, Barbu, Alina, Boone, Aaron, Bouyssel, F., Brousseau, Pierre, Brun, E., Calvet, Jean-Christophe, Carrer, Dominique, Decharme, B., Delire, Christine, Donier, S., Essaouini, K., Gibelin, A., Voldoire, A. (2013). The SURFEXv7.2 land and ocean surface platform for coupled or offline simulation of earth surface variables and fluxes. Geosci. Model Dev., 6, 929–960. https://doi.org/10.5194/gmd-6-929-2013

Mironov, D., E. Heise, E. Kourzeneva, B. Ritter, N. Schneider, and A. Terzhevik (2010). Implementation of the lake parameterisation scheme flake into the numerical weather prediction model COSMO. Boreal Env. Res., 15:218–230.

Müller, M., Batrak, Y., Kristiansen, J., Køltzow, M. A. Ø., Noer, G., & Korosov, A. (2017). Characteristics of a Convective-Scale Weather Forecasting System for the European Arctic. Monthly Weather Review, 145(12), 4771-4787. https://journals.ametsoc.org/view/journals/mwre/145/12/mwr-d-17-0194.1.xml

Polichtchouk I et al (2023). The current forecasting equations are still valid near km-scale resolution. ECMWF blog post, 22 November 2023.
https://www.ecmwf.int/en/about/media-centre/news/2023/current-forecasting-equations-are-still-valid-near-km-scale

Støylen, E. et al (2024). Report on reanalysis production data flow and monitoring. Copernicus C3S2 deliverable report D361a.1.2.1.

Tavolato, C. and Isaksen, L. (2015). On the use of a Huber norm for observation quality control in the ECMWF 4D-Var. Q.J.R. Meteorol. Soc., 141: 1514-1527. https://doi.org/10.1002/qj.2440

Toohey M., Stevens, B., Schmidt, H., and Timmreck, C. (2016). Easy Volcanic Aerosol (EVA v1.0): An idealized forcing generator for climate simulations. Geosci. Model Dev., 9 (11): 4049–4070, https://doi.org/10.5194/gmd-9-4049-2016

Yang, X., Palmason, B., Sattler, K., Thorsteinsson, S., Amstrup, B., Dahlbom, M, Hansen Sass, B., Pagh Nielsen, K., Petersen, G. N (2018). IGB, the Upgrade to the Joint Operational HARMONIE by DMI and IMO in 2018. ALADIN-HIRLAM Newsletter, No. 11, 93-96, 2018.

Yang X. et al (2019): C3S Arctic regional reanalysis – Full system documentation. Copernicus C3S D322 report 2.1.2.2. https://confluence.ecmwf.int/display/CKB/Copernicus+Arctic+Regional+Reanalysis+%28CARRA%29%3A+Full+system+documentation

Yang X. et al (2021). Alternative design options and recommendation for a future pan-Arctic reanalysis system,  C3S_D322_Lot2.1.5.1-202103_Pan-Arctic_v3.docx, Copernicus C3S D322 report 2.3.5.1.

Yang X. et al (2025). Test and evaluation report on fully configured CARRA2 reanalysis system. Copernicus C3S D361a project report,  C3S2_D361a.1.1.2_Test_Evaluation_v1.docx. In preparation.

 

Reference sites for code modules:

Pysurfex: https://github.com/metno/pysurfex

Titanlib: https://github.com/metno/titanlib

Gridpp: https://github.com/metno/gridpp

Structure functions: https://hirlam.github.io/HarmonieSystemDocumentation/dev/DataAssimilation/StructureFunctions/


This document has been produced in the context of the Copernicus Climate Change Service (C3S).

The activities leading to these results have been contracted by the European Centre for Medium-Range Weather Forecasts, operator of C3S on behalf of the European Union (Delegation Agreement signed on 11/11/2014 and Contribution Agreement signed on 22/07/2021). All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose.

The users thereof use the information at their sole risk and liability. For the avoidance of all doubt , the European Commission and the European Centre for Medium - Range Weather Forecasts have no liability in respect of this document, which is merely representing the author's view

Related articles

Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.