# Copernicus Arctic Regional Reanalysis (CARRA): Data User Guide

Contributors: K. P. Nielsen (DMI), X. Yang (DMI), S. Agersten (MET Norway), P. Dahlgren (MET Norway), M. A. Ødegaard Køltzow (MET Norway), H. Schyberg (MET Norway), E. Støylen (MET Norway), J. Bojarova (SMHI)

Issued by: Kristian Pagh Nielsen (DMI)

Issued Date: 31/01/2022

Ref:  C3S_D311_Lot2.4.1.2–202102-DataUserGuide-v9

Official reference number service contract: 2017/C3S_322_Lot2_METNO/SC3

# 1. Introduction

This user guide describes the datasets released from the Arctic Regional Reanalysis service, which is part of the Copernicus Climate Change Service (C3S). The datasets will include the actual grid point reanalysis information on different levels (atmospheric vertical levels, surface including soil). This version also provides details on the uncertainty information provided. We will refer to the dataset as the CARRA (Copernicus Arctic Regional ReAnalysis) dataset.

First in this user guide you will find sections on the data availability, formats and data content followed by the information on the uncertainty of the data. For a short description of the principles and methods behind this reanalysis, numerical weather prediction and data assimilation, see the Annex at the end of this document.

# 2. Data availability

The Arctic regional reanalysis data can be downloaded from the Copernicus Climate Data Store (CDS) at the following locations:

The data are freely available; the only requirement is to register as a CDS user. There are several options to download and visualize the data. An introduction to the CDS and the CDS Toolbox is available in the Copernicus User Learning Services (registration is needed) at https://uls.climate.copernicus.eu/group/learning/browse-lessons?packageId=1148.

2. Use CDS Toolbox to write and execute an extraction script.
3. Install the python package cdsapi and download the data directly with python scripts.

Each of these options are detailed below.

Option 1 is straightforward; just choose the required data via the CDS form either:

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-carra-single-levels

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-carra-pressure-levels

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-carra-height-levels

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-carra-model-levels

then click the "Download data" tab in the browser, fill in and submit the form. Below the form you will find the corresponding Toolbox request (see option 2 below) and API request (see option 3 below). The result of these requests will be one GRIB or NetCDF file. Option 1 is for one-time or first-time users.

Option 2 is mainly aimed at further processing of the data in the CDS Toolbox. This involves writing some code in the CDS Toolbox editor. You can take this code from the Toolbox request generated in option 1 above. As mentioned this method is mainly for using data in the CDS Toolbox, but you could also use it to download data. An example retrieval script is given in section 2.1 just below.

Option 3 is the choice of the advanced user, who needs to download larger amounts of data. This option also provides access to all data including forecast data. To use this method you need to install the python package cdsapi on your computer. This is described at https://cds.climate.copernicus.eu/api-how-to

After installing the CDS API you can execute python scripts to retrieve the reanalysis data. In section 2.1 examples are given for how to retrieve the Arctic reanalysis 2 meter temperature 6 hour forecast and 24 hour precipitation analysis data. The request in section 2.1 is based on code generated by the CDS web form (see option 1 above). In practice, option 1 is an easy way to make a template script.

The scripts in section 2.1 are also available for download via the open G

## 2.1. CDS API example, C3S Arctic reanalysis

The example below shows a Python script for retrieving a subset of the C3S Arctic Reanalysis data from CDS for the month of July 2012 for the West domain into an output file named: "CARRA-West_2012-07.grib". Such script can be derived from the download form (see "Show API request" at the bottom of the form).

import cdsapi
c = cdsapi.Client()
c.retrieve(
'reanalysis-carra-single-levels',
{
'format': 'grib',
'domain': 'west_domain',
'variable': '2m_temperature',
'level_type': 'surface_or_atmosphere',
'type': 'analysis',
'year': '2012',
'time': [
'00:00', '03:00', '06:00',
'09:00', '12:00', '15:00',
'18:00', '21:00',
],
'day': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
'13', '14', '15',
'16', '17', '18',
'19', '20', '21',
'22', '23', '24',
'25', '26', '27',
'28', '29', '30',
'31',
],
'month': '07',
},
'CARRA-West_2012-07.grib')



The data is divided into the C3S Arctic West (CARRA-West) and C3S Arctic East (CARRA-East) region as shown in Figure 1. The western region covers Greenland and Iceland, while the eastern region covers the Arctic part of Scandinavia, Svalbard, the Barents Sea and the westernmost islands in the Russian Arctic. We recommend that data for Svalbard are retrieved from the eastern domain, and that data for Jan Mayen are retrieved from the western domain.

Figure 1: The CARRA-West and CARRA-East (C3S Arctic Reanalysis) domains.

Furthermore, the data is divided according to output level type. There are five output level types included in four catalogue entries:

• Single levels
• Soil levels
• Model levels
• Pressure levels
• Height levels

The variables available at these levels will be detailed in section 3. Single level data include surface data, data at diagnostic levels, such as 2-metre temperature and 10-metre wind speed, precipitation, cloud cover, and accumulated energy fluxes and the surface and at the top of the atmosphere. Soil level data are data from below the surface. They are included in the single level catalogue entry.

Model level data are data from the 65 (hybrid) model levels. Level 1 is at 10 hPa and level 65 is approx. 12 meters above the surface. They are called hybrid levels since they gradually change from the uppermost levels, which are at surfaces of constant pressure, to the lowermost levels, which follow the topography at the surface.

Pressure level (23 levels) and height level (11 levels) data are interpolated from model level data to a specific pressure or a specific height above the surface.

In the example above single level data are retrieved from the western domain. If instead pressure level data from the eastern domain is required the following command should be used in the script: c.retrieve('reanalysis-carra-pressure-levels', 'domain': 'east_domain'...)

# 3. Data formats

The reanalysis data is available in two formats: GRIB and NetCDF.

## 3.1. GRIB

The GRIB data format is a standard binary format for modelled meteorological data governed by the World Meteorological Organisation (WMO). This data format stored the data in a very compact way and the metadata is contained within the files. It can be read with the tool eccodes that is available from the European Centre for Medium-range Weather Forecasts (ECMWF): https://confluence.ecmwf.int/display/ECC/GRIB+tools. National weather services and weather companies use a wide array of graphical tools to display GRIB data. The GRIB data format is recommended for users comfortable and familiar with this format.

## 3.2. NetCDF

NetCDF (Network Common Data Form, https://www.unidata.ucar.edu/software/netcdf/docs/index.html ) is an open standard data format developed and supported by the American University Corporation for Atmospheric Research (UCAR). It is a widely used format for geoscientific data. A large array of access libraries and applications for reading and plotting NetCDF data exist: https://www.unidata.ucar.edu/software/netcdf/software.html.

# 4. Data content

## 4.1. Single level variables

Diagnostic single level output is available in three hourly intervals at 00, 03, 06, 09, 12, 15, 18 and 21 UTC.

Long forecasts are available from the forecasts initiated at 00 and 12 UTC. Long forecasts include forecast lengths of 1, 2, 3, 4, 5, 6, 9, 12, 15, 18, 21, 24 and 30 hours.

Short forecasts of 1, 2 and 3 hours are made for the forecasts initiated at 03, 06, 09, 15, 18 and 21 UTC.

For most of the variables the shortest forecast data are recommended to use. In general the data quality decreases with forecast length. On the other hand, for variables that are affected by spin-up effects - that is the model needs to run for a certain number of hours before these variables have an optimal quality - the longer forecasts can be better to use. The cloud and precipitation variables are directly affected by spin-up. For time integrated quantities such as precipitation, accumulation over 12 hours between +6 and +18 h forecasts are recommended. Likewise 24 hour accumulation can be obtained as the difference between +30 hour and +6 hour forecasts. However, if accuracy in timing of precipitation events is of very high importance in the application, an option could be to combine hourly forecasts from each of the 8 analysis times (00, 03, 06, 09, 12, 15, 18, 21 UTC) for lead times 1, 2 and 3, which then will be slightly affected by spin-up. It is not possible to make general recommendations on this issue, therefore users are advised to make their own choices based on the general guidelines described here. (On spin-up of precipitation, see also here.)

Table 1: Overview of single level variables. Some variables have not yet been uploaded to CDS, these are marked with * and unfortunately are not included amongst the released reanalysis data. Most static fields (except land-sea mask and orography) marked with ** are available only as NetCDF-files below.

 Precipitation, cloud water and humidity Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Height 2m relative humidity 2r % 260242 yes yes 2m 2m specific humidity 2sh kg/kg 174096 yes yes 2m Total column integrated water vapour tciwv kg/m2 260057 yes yes vertically integrated above the surface Total column cloud liquid water tclw kg/m2 78 no yes vertically integrated above the surface Total column cloud ice water tciw kg/m2 79 no yes vertically integrated above the surface Total column graupel tcolg kg/m2 260001 yes yes vertically integrated above the surface Total precipitation tp kg/m2 228228 no yes surface Time integral of rain flux tirf kg/m2 235015 no yes surface Time integral of total solid precipitation flux titspf kg/m2 260645 no yes surface Precipitation type ptype integer code 260015 no yes surface Surface runoff sro kg/m2 174008 no yes surface Percolation (drainage) perc kg/m2 260430 no yes sub-surface

 Temperature and wind speed Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Height 10m wind speed 10si m/s 207 yes yes 10m 10m wind direction 10wdir degrees 260260 yes yes 10m 10m u-component of wind (defined relative to the rotated model grid) 10u m/s 165 yes yes 10m 10m v component of wind (defined relative to the rotated model grid) 10v m/s 166 yes yes 10m 10m eastward wind gust since previous post-processing(defined relative to the rotated model grid) 10efg m/s 260646 no yes 10m 10m northward wind gust since previous post-processing(defined relative to the rotated model grid) 10nfg m/s 260647 no yes 10m 10m wind gust since previous post-processing 10fg m/s 49 no yes 10m Maximum 2m temperature since previous post-processing mx2t K 201 no yes 2m Minimum 2m temperature since previous post-processing mn2t K 202 no yes 2m 2m temperature 2t K 167 yes yes 2m Skin temperature skt K 235 yes yes Surface

 Accumulated fluxes Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Height Albedo al % 260509 no yes surface Evaporation eva kg/m2 260259 no yes surface Time integral of snow evaporation flux tisef kg/m2 235072 no yes surface Surface sensible heat flux sshf J/m2 146 no yes surface Surface latent heat flux slhf J/m2 147 no yes surface Time integral of surface latent heat evaporation flux tislhef J/m2 235019 no yes surface Time integral of surface latent heat sublimation flux tislhsf J/m2 235071 no yes surface dsrp J/m2 47 no yes surface Time-integrated surface direct short wave radiation flux tidirswrf J/m2 260264 no yes surface Surface net solar radiation ssr J/m2 176 no yes surface Surface solar radiation downwards ssrd J/m2 169 no yes surface Surface net solar radiation, clear sky ssrc J/m2 210 no yes surface Surface net thermal radiation str J/m2 177 no yes surface Surface thermal radiation downwards strd J/m2 175 no yes surface Surface net thermal radiation, clear sky strc J/m2 211 no yes surface Top net solar radiation tsr J/m2 178 no yes surface Top net thermal radiation ttr J/m2 179 no yes surface tisemf kg⋅m/s 235017 no yes surface Time integral of surface northward momentum flux tisnmf kg⋅m/s 235018 no yes surface Pressure Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Height Mean sea level pressure msl Pa 151 yes yes surface (scaled to sea level) Surface pressure sp Pa 134 yes yes surface Geometric cloud properties Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Height High cloud cover hcc % 3075 yes yes above 5000m Medium cloud cover mcc % 3074 yes yes 2500m - 5000m Low cloud cover lcc % 3073 yes yes surface - 2500m Total cloud cover tcc % 228164 yes yes above ground Fog (lowest model level cloud) fog % 260648 no yes lowest model level Visibility vis m 3020 yes yes lowest model level Cloud base cdcb m 260107 yes yes - Cloud top cdct m 260108 yes yes - Snow Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Height Snow density rsn kg/m3 33 yes yes surface Snow depth water equivalent sd kg/m2 228141 yes yes surface Fraction of snow cover fscov 0-1 260289 yes no surface Snow albedo asn % 228032 yes no surface Surface roughness lengths Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Height Surface roughness sr m 173 yes no surface Surface roughness length for heat srlh m 260651 yes no surface Sea states Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Height Sea surface temperature (SST) sst K 34 yes no surface Sea ice area fraction ci 0-1 31 yes no surface Sea ice surface temperature sist K 260649 yes yes surface sithick m 174098 yes no surface Snow on ice total depth sitd m 260650 yes yes surface Static fields Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Height Land-sea mask lsm % 172 no no surface Sea tile fraction** NA 0-1 NA no no surface Inland water tile fraction** NA 0-1 NA no no surface Urban tile fraction** NA 0-1 NA no no surface Nature tile fraction** NA 0-1 NA no no surface Glacier fraction** NA 0-1 NA no no surface Orography orog m 228002 no no surface

The static fields marked as ** above are available as NetCDF files for the West and East domain respectively here: fractions.west.nc and fractions.east.nc

## 4.2. Soil level variables

Soil level variables are given for two model depths, where the first depth is the soil surface and the second depth is the so-called root depth. The root depth varies with the cover type climatology. Please note that the soil level variables are accommodated in the Arctic Regional Reanalysis single level variables catalogue entry.

Diagnostic output at soil levels is available in three hourly intervals at 00, 03, 06, 09, 12, 15, 18 and 21 UTC.

Long forecasts are available from the forecasts initiated at 00 and 12 UTC. Long forecasts include forecast lengths of 1, 2, 3, 4, 5, 6, 9, 12, 15, 18, 21, 24 and 30 hours.

Short forecasts of 1, 2 and 3 hours are made for the forecasts initiated at 03, 06, 09, 15, 18 and 21 UTC.

Table 2: Overview of soil level variables

 Soil level variables Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Volumetric soil ice vsi m³/m³ 260644 yes yes Volumetric soil moisture vsw m³/m³ 260199 yes yes

## 4.3. Model level variables

Model level variables are output at 65 hybrid model levels of the HARMONIE-AROME model. These follow the surface at the lowest levels and are gradually evolved into pure pressure levels at the highest levels. These are the levels at which the model computations are done. The height level and pressure level variables are interpolated from these data.

Diagnostic model level output (analysis), i.e. 0 hour forecasts, are available every 3 hours with 1 and 2 hour forecasts in between.

Table 3: Overview of model level variables

 Model level variables Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2 Specific humidity q kg/kg 133 yes yes Temperature t K 130 yes yes u-component of wind (defined relative to the rotated model grid) u m/s 131 yes yes v-component of wind (defined relative to the rotated model grid) v m/s 132 yes yes ccl % 260257 yes yes Specific cloud liquid water content clwc kg/kg 246 yes yes Specific cloud ice water content ciwc kg/kg 247 yes yes Specific cloud rain water content crwc kg/kg 75 yes yes Specific cloud snow water content cswc kg/kg 76 yes yes Graupel grle kg/kg 260028 yes yes Turbulent kinetic energy tke J/kg 260155 yes yes

## 4.4. Pressure level variables

Pressure level variables are interpolated to 23 specific pressure levels: 1000, 950, 925, 875, 850, 800, 750, 700, 600, 500, 400, 300, 200, 100, 70, 50, 30, 20 and 10 hPa. Thus, they are on isobaric surfaces.

Output of diagnostic variables at pressure levels are available in three hourly intervals at 00, 03, 06, 09, 12, 15, 18 and 21 UTC.

Long forecasts are available from the forecasts initiated at 00 and 12 UTC. Long forecasts include forecast lengths of 1, 2, 3, 4, 5, 6, 9, 12, 15, 18, 21, 24 and 30 hours.

Short forecasts of 1, 2 and 3 hours are made for the forecasts initiated at 03, 06, 09, 15, 18 and 21 UTC.

Table 4: Overview of pressure level variables

 Pressure level variables Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Relative humidity r % 157 yes yes Temperature t K 130 yes yes u-component of wind (defined relative to the rotated model grid) u m/s 131 yes yes v-component of wind (Component defined relative to the rotated model grid) v m/s 132 yes yes Geometric vertical velocity wz m/s 260238 yes yes Cloud cover ccl % 260257 yes yes Specific cloud liquid water content clwc kg/kg 246 yes yes Specific cloud ice water content ciwc kg/kg 247 yes yes Specific cloud rain water content crwc kg/kg 75 yes yes Specific cloud snow water content cswc kg/kg 76 yes yes Graupel (snow pellets) grle kg/kg 260028 yes yes Pseudo-adiabatic potential temperature papt K 3014 yes yes Geopotential z m²/s² 129 yes yes Potential vorticity pv K·m²/ (kg·s) 60 yes yes

## 4.5. Height level variables

Height level variables are interpolated to 11 specific height levels: 15, 30, 50, 75, 100, 150, 200, 250, 300, 400 and 500 metres above the surface.

Output of diagnostic height level variables is available in three hourly intervals at 00, 03, 06, 09, 12, 15, 18 and 21 UTC.

Long forecasts are available from the forecasts initiated at 00 and 12 UTC. Long forecasts include forecast lengths of 1, 2, 3, 4, 5, 6, 9, 12, 15, 18, 21, 24 and 30 hours.

Short forecasts of 1, 2 and 3 hours are made for the forecasts initiated at 03, 06, 09, 15, 18 and 21 UTC.

Table 5: Overview of height level variables

 Height level variables Name Short Name Unit Param ID Analysis: 0,3,...,21 Forecast: 1,2,3,… Relative humidity r % 157 yes yes Temperature t K 130 yes yes Wind speed ws m/s 10 yes yes Wind direction wdir deg 3031 yes yes Specific cloud liquid water content clwc kg/kg 246 yes yes Specific cloud ice water content ciwc kg/kg 247 yes yes Pressure pres Pa 54 yes yes

# 5. Details about the data fields

All data fields are model grid box (2.5 km X 2.5 km = 6.25 km2) averages. When output data are compared to local data near the coast or close to a glacier boundary, it should be remembered that the model variables represent such averages. For instance, to compute the surface forcing at specific glacial sites near a glacier boundary, it will be most representative to choose output from a model grid box that is fully on the glacier, rather than the closest model grid box if this only partially covers the glacier. Also, very local rain shower intensity can be higher than the modelled average grid box intensity.

Output variables can be either instantaneous, accumulated, or maxima/minima from a given period. This is specified for each of the variables listed below.

## 5.1. Precipitation and water fluxes

Precipitation at the surface is output as two separate types: Rain and total solid precipitation. These have the unit kg/m2, which for rain with a density of 1000 kg/m3 is the same unit as mm. For the solid precipitation, using this unit removes confusion between mm water equivalent and mm thickness. Total solid precipitation is the sum of the model variables snow and graupel. Total precipitation is the sum of all three precipitation types. The precipitation species include both convective and stratiform precipitation and are available only for the forecast time steps. They are accumulated variables meaning that they are accumulated from the beginning of the forecast. For instance, the 24h-forecast includes accumulated precipitation over 24 hours. Hourly precipitation can be retrieved by subtracting two accumulated precipitation fields with one-hour separation.

Precipitation type is an index integer value from 1 to 8:

1. Drizzle
2. Rain
3. Sleet
4. Snow
5. Freezing drizzle
6. Freezing rain
7. Graupel
8. Hail

and is also diagnosed from empirical relations (pers. comm. Sami Niemelä, FMI, 2015).

Water fluxes at the surface are output as surface runoff, percolation (drainage), water evaporation and snow sublimation. All these variables have units of kg/m2. Surface runoff occurs, when the model soil is saturated with water and more precipitation comes. Percolation is drainage of water below the deep soil level in the model. The water evaporation can be both positive and negative, where negative values signify condensation. This is also true for the snow sublimation.

## 5.2. Cloud and water vapour integrated variables

All cloud and water vapour variables are instantaneous, i.e. they are given for the time step at which they are output. Vertically integrated water vapour is given in units of kg/m2. It is vertically integrated from the surface to the top of the atmosphere. In practice it is computed from the specific water vapour on the 65 model levels (see section 4.4). Likewise, integrated cloud liquid water, integrated cloud ice water, and integrated graupel are computed from the specific cloud liquid water, cloud ice and graupel on the 65 model levels.

Total cloud cover, as given in the output, is computed from model level cloud cover with the nearly maximum-random cloud overlap assumption with a scaling coefficient of 0.8. If this had been 0.0 random cloud overlap would be assumed, which means that the cloud covers at the model levels are assumed to be independent. If the scaling coefficient had been 1.0 maximum-random cloud overlap is assumed, which means that all vertically connected cloud layers are assumed to overlap perfectly. Note that this definition of cloud cover is not consistent with maximum-random cloud cover used within the model for computing radiative fluxes! The same nearly maximum-random cloud overlap assumption is used to compute high, medium and low cloud covers. Following the WMO definitions high cloud cover is above 5 km height, while low cloud cover is below or at 2 km height. Medium cloud cover is in between. Note that height here is considered relative to the surface! Fog is the cloud cover at the lowest model level that has a thickness of approximately 20 m. The unit for all cloud cover and fog output is % in the range from 0% to 100%.

Visibility is given with the unit m and is calculated from the cloud water, cloud ice, rain, snow and graupel present at the lowest model level. For cloud and precipitation free conditions mist is calculated from the lowest model level relative humidity and the concentration of cloud condensation nuclei. Empirical relations are used (pers. comm. Esbjörn Olsson, SMHI, 2013). Cloud base height and cloud top height are output in units of m above the surface. They are defined for the highest and lowest model level with more than 4/8 cloud cover.

## 5.3. Variables at 10-metre height

All 10-metre height wind variables are instantaneous, i.e. they are given for the time step at which they are output. Diagnosed u and v wind components (u and v) are output at the WMO standard height of 10 m above the surface with the unit m/s. The u- and v-components follow the direction of the Lambertian model grid with the u-component being directed 90 degrees clockwise relative to the v-component. The diagnosed winds are computed from the winds at the lowest full model level (see section 4.3). The 10-metre u wind component (u10m) in the case of a stable or neutral boundary layer is calculated as

$$u_{10m} = u_{lml} \left[ ln \left( 1 + \frac{10}{z_{lml}} \left( e^{\kappa / \sqrt{C_{dn}}} - 1 \right) \right) - \frac{10}{z_{lml}} \left( \frac{\kappa}{\sqrt{C_{dn}}} - \frac{\kappa}{\sqrt{C_{d}}} \right) \right] \frac{\sqrt{C_{d}}}{\kappa}, (1)$$

where ulml is the wind speed at the lowest model level, 10 is the diagnostic height in metres, zlml is the height of the lowest full model level in metres, κ is the von Karman constant, Cdn is the momentum drag coefficient in neutral conditions, and Cd is the momentum drag coefficient. For the case of an unstable boundary layer it is calculated as

$$u_{10m} = u_{lml} \left[ ln \left( 1 + \frac{10}{z_{lml}} \left( e^{\kappa / \sqrt{C_{dn}}} - 1 \right) \right) - ln \left( 1 + \frac{10}{z_{lml}} \left( e^{\kappa / \sqrt{C_{dn}} - \kappa / \sqrt{C_{d}}} - 1 \right) \right) \right] \frac{\sqrt{C_{d}}}{\kappa}, (2)$$

Equations 1 and 2 are used likewise for the 10-metre v wind component. The wind speed U can be calculated from the components as

$$U = \sqrt{u^2 + v^2}. (3)$$

The wind direction D clockwise from North can be calculated as

$$D = tan_{2}^{-1} \frac{u/U}{v/V} + 180^{\circ} + \alpha, (4)$$

where 𝛼 is the local rotation of the model grid relative to North, and tan2-1 is the very specific arcus tangens function atan2, which is included in most programming languages. With atan2 both the sign in the numerator and the denominator are independently important for the resulting angle. Take care to check if the atan2 result is in radians, in which case it should be converted to degrees with the factor 180°/π. Also take care to check if the resulting direction is between 0° and 360°. Note that the wind direction is the direction from which the wind comes! The grid rotation angle 𝛼 can be computed with this script: https://github.com/metno/NWPdocs/wiki/From-x-y-wind-to-wind-direction.

10-metre u and v wind gust components are also output. These are computed from the diagnosed 10-metre winds and the turbulent kinetic energy (pers. comm. Gwenaëlle Hello, Meteo France, 2007).

## 5.4. Variables at 2-metre height

2-metre temperature (T2m) is diagnosed from the so-called skin temperature at 0 metre height (T0m) Since the 2-metre temperature is often referred to as the surface temperature we here, in order to avoid misunderstandings, explicitly use 2-metre and 0-metre subscripts. and the temperature at the lowest model level (Tlml). For stable and neutral boundary layers it is calculated as

$$T_{2m} = T_{0m} + \left[ ln \left( 1 + \frac{2}{z_{0h}} - \frac{2}{z_{lml}} \right) - \frac{2}{z_{lml}} \left( ln\frac{z_{lml}}{z_{0h}} - \frac{\kappa \sqrt{C_{d}}}{C_{h}} \right) \right] \frac{C_{h}}{\kappa \sqrt{C_{d}}} \left( T_{lml} - T_{om} \right), (5)$$

where z0h is the heat roughness length in metres, 2 is the diagnostic height in metres, zlml is the height of the lowest model level in metres, κ is the von Karman constant, Cd is the momentum drag coefficient, and Ch is the heat drag coefficient. For the case of an unstable boundary layer it is calculated as

$$T_{2m} = T_{0m} + \left[ ln \left( 1 + \frac{2}{z_{0h}} - \frac{2}{z_{lml}} \right) - ln \left( 1+ \frac{2}{z_{lml}} \left( exp \left( ln\frac{z_{lml}}{z_{0h}} - \frac{\kappa \sqrt{C_{d}}}{C_{h}} \right) - 1 \right) \right) \right] \frac{C_{h}}{\kappa \sqrt{C_{d}}} \left( T_{lml} - T_{om} \right). (6)$$

Note that the 2-metre temperature is commonly referred to as the surface temperature even though it is not at the actual surface! The 2-metre height temperature is instantaneous at the hourly time steps and has the unit K. In addition to these, 6-hour maximum and minimum temperatures are also output variables. These values are computed from temperature values at each model time step of 75 s.

The HARMONIE-AROME model operates with 4 different surface tiles for which the surface physical variables are all computed independently. These are: sea, inland water, urban and nature, i.e. all land areas that are not urban or inland waters. Inland waters are lakes and rivers. Each grid box contains specific fractions of these 4 surface tiles. The tile-specific 2-metre temperatures are particularly important in coastal regions, where the sea and land temperatures can differ with many degrees. They are: 2-metre temperature for the sea tile, 2-metre temperature for the inland water tile, 2-metre temperature for the urban tile, and 2-metre temperature for the nature tile.

2-metre specific humidity in units of kg/kg is diagnosed in the same way as the 2-metre temperature, only from the specific humidity at 0 metre height and at the lowest model level rather than the temperatures at these levels. From this and the saturation specific humidity the 2-metre relative humidity in % is computed.

## 5.5. Variables at 0-metre height

The surface pressure is given in units of Pascal (Pa). From this the mean sea level pressure [Pa] is computed by reducing the surface pressure to the mean sea level. To avoid confusion with the 2-metre “surface” temperature, the 0-metre temperature is referred to as the skin temperature. It is given in units of K. Both the pressure and temperature variables are instantaneous, i.e. they are given for the time step at which they are output. The surface is the lowest model half-level. Thus, the variables at this level are explicitly calculated on each model iteration.

All energy fluxes at the surface are output as accumulated variables from the initial time of the forecast to the forecast hour in question with the unit J/m2. They are considered positive downward to the surface. Energy fluxes are not output variables at the analysis times. Average hourly energy fluxes in W/m2 can be computed by subtracting two successive hourly accumulated variables and dividing by 3600 s. The solar radiation variables at the surface are accumulated downward, direct, direct normal and net solar radiation. Direct normal solar radiation is considered on a plane perpendicular to the direction to the sun, while the other solar variables are considered on a horizontal surface. The albedo in units of % is also given. Multiplying this with the accumulated downward solar radiation gives the accumulated upward solar radiation. The net solar radiation is the difference between the downward solar radiation and the upward solar radiation. Note that the albedo should only be used from forecast data, as the analysis time albedo are incorrect – and not used in the model. The variable accumulated net clear sky solar radiation is the net solar radiation of a cloud free atmosphere. Dividing this with one minus the albedo gives the accumulated downward clear sky solar radiation. The thermal radiance variables at the surface are accumulated downward, net, and net clear sky thermal radiation. The thermal radiation variables are all considered on a horizontal surface. The net thermal radiation is the difference between the downward thermal radiation and the upward thermal radiation. The upward thermal radiation can be calculated by subtracting the net thermal radiation from the downward thermal radiation. The accumulated surface sensible heat flux is the conductive energy from the atmosphere to the surface. If this is going from the surface to the atmosphere it has negative values. The accumulated latent heat flux is the sum of all latent energy fluxes that are due to the phase transitions of water. Here condensation causes a positive latent heat flux to the surface, and evaporation causes a negative heat flux from the surface. The latent heat due to evaporation and sublimation are given as individual output variables.

The u and v components of the accumulated surface momentum flux are given as output variables in units of kg m/s. The momentum roughness length and the heat roughness length as used in the model are given as output in units of m. Note that the roughness lengths should only be used from forecast data, as the analysis time values are incorrect – and not used in the model.

## 5.6. Variables at the top of the atmosphere

At the top of the atmosphere (TOA) the accumulated solar net TOA radiation and the accumulated thermal net TOA radiation are output variables in units of J/m2. These are both considered on a horizontal surface and are both positive in the downward direction. Since the downward solar TOA radiation is always larger than the upward solar TOA radiation, the solar net TOA radiation is always positive. Since there is virtually no downward thermal TOA radiation, the thermal net TOA radiation is always negative. The TOA is the highest model half-level. Thus, the variables at this level are explicitly calculated.

## 5.7. Snow variables

The snow variables are given as instantaneous values from the most recent model time step relative to the output time. The snow density output unit is kg/m3, the snow water equivalent (SWE) output is in units of kg/m2, and the snow fraction output has fractional units in the range 0-1. The snow depth can be derived from these variables.

## 5.8. Sea and sea-ice variables

For the sea the sea surface temperature is output in units of K. For areas partially or completely covered with sea-ice, the following variables are output: Sea-ice area fraction [-], upper layer sea ice temperature [K], sea-ice thickness [m], and sea-ice snow thickness [m]. Here the sea-ice fraction and sea surface temperatures are in fact interpolated input data and are only updated once every day. During the course of a forecast they are kept constant. All other sea and sea-ice variables are given as instantaneous values from the most recent model time step relative to the output time.

## 5.9. Soil level variables

The soil level variables are computed at the skin depth immediately at the surface and at the "root depth", that is the average depth from which the vegetation retrieves its water. This level varies depending on the surface type and is also defined for surface types without vegetation - for instance rock. The volumetric soil moisture content is the volume concentration of liquid water at root depth. The volumetric soil frozen water content is the volume concentration of ice at root depth. Please note that the soil level variables are listed in the "single level" catalogue entry. The root depth variables are used to account for effect of the evapotranspiration on the temperature and humidity at and immediately above the surface. They are a means to an end rather than a detailed model of what the humidity is at specific depths in the soil. For anyone who wishes to perform detailed modelling of processes below the surface we recommend to use surface fluxes from our reanalysis dataset as forcing for more advanced sub-surface models.

## 5.10. Model level variables

The model level variables are computed at the full model levels, and are given as instantaneous values from the most recent model time step relative to the output time. There are 65 vertical model levels in HARMONIE-AROME.  These full model levels are hybrid-sigma coordinates that are counted from the model top toward the surface. They go from being pure pressure levels, i.e. levels with constant pressure starting at 10 hPa, 30 hPa, etc. to being relative to the surface topography in height. Level 64 is at approximately 30 m height and level 65 is at approximately 12 m height above the surface. The following thermodynamic variables are the output variables at model levels: Temperature [K], u-component of wind [m/s], v-component of wind [m/s], turbulent kinetic energy [J/kg]. Here turbulent kinetic energy is the mean kinetic energy per unit mass from eddies in turbulent flow. Note that the HARMONIE-AROME weather forecasting model with 2.5 x 2.5 km2 resolution does not explicitly resolve this turbulent energy. The u- and v- wind components follow the direction of the Lambertian model grid with the u-component being directed 90 degrees clockwise relative to the v-component. From these model level wind components, the model level wind speed and wind direction relative north can be calculated with Equations 3 and 4. The grid rotation angle 𝛼 can be computed with this script: https://github.com/metno/NWPdocs/wiki/From-x-y-wind-to-wind-direction.

The following moisture, cloud and precipitation variables are output at model levels: Specific humidity, specific cloud liquid water, specific cloud ice water content, specific cloud rain water content, specific cloud snow water content, and graupel. All of these variables are given in units of kg/kg. Additionally, the cloud cover in % is output on model levels. The model level cloud covers can be used to compute the total cloud cover based on other cloud cover overlap assumptions than the one described in section 5.2.

## 5.11. Pressure level variables

Pressure level variables are output at the pressure levels: 1000, 950, 925, 900, 875, 850, 825, 800, 750, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, and 10 hPa, and are given as instantaneous values from the most recent model time step relative to the output time. The model level variables described in section 4.3 are vertically interpolated to fixed pressure levels. The following thermodynamic variables are output at pressure levels: Temperature [K], u-component of wind [m/s], v-component of wind [m/s], geometric vertical velocity [m/s], geopotential [m2/s2], pseudo-adiabatic potential temperature [K], and potential vorticity [(K m2)/(kg s)]. Here the geopotential is the work required to lift an air parcel of unit mass from mean sea level to the given pressure level. The pseudo-adiabatic potential temperature is the temperature that an air parcel would have if it were first expanded through a pseudo-adiabatic process to 0 hPa pressure and thereafter compressed to a pressure of 1000 hPa through a dry-adiabatic process. The potential vorticity is a measure of the capacity for air to rotate in the atmosphere. The u- and v- wind components follow the direction of the Lambertian model grid with the u-component being directed 90 degrees clockwise relative to the v-component.  From these pressure level wind components, the pressure level wind speed and wind direction relative north can be calculated from Equations 3 and 4. The grid rotation angle 𝛼 can be computed with this script: https://github.com/metno/NWPdocs/wiki/From-x-y-wind-to-wind-direction.

The following moisture, cloud and precipitation variables are output at pressure levels: Relative humidity [%], cloud cover [%], specific cloud liquid water content [kg/kg], specific cloud ice water content [kg/kg], specific cloud rain water content [kg/kg], specific cloud snow water content [kg/kg], and graupel [kg/kg]. These pressure level variables are interpolated from the full model level variables described in section 4.3.

## 5.12. Height level variables

Height level variables are output at the following fixed heights above the surface: 15 m, 30 m, 50 m, 75 m, 100 m, 150 m, 200 m, 250 m, 300 m, 400 m and 500 m. They are interpolated to these heights from the full model level variables described in section 4.3. The following thermodynamic variables are output at height levels: Temperature [K], pressure [Pa], wind speed [m/s], and wind direction [degrees from North]. Note here that the wind direction is defined as the direction from which the wind comes!

The following moisture, cloud and precipitation variables are output at height levels: Relative humidity [%], specific cloud liquid water [kg/kg], and specific cloud ice water content [kg/kg]. These height level variables are interpolated from the full model level variables described in section 4.3.

## 5.13. Static fields

Static fields are output variables that do not change depending on the model initial time or the forecast length (in other words they are time-independent). These include the land-sea mask, that is the fraction of land in a given model grid box of 2.5 x 2.5 km2 in units of %, and the orography in units of m. For each model grid box in HARMONIE-AROME 4 tile fractions are defined in units of fraction. These are: The fraction of sea, the fraction of inland water (lakes and rivers), the fraction of urban areas, and the fraction of nature, i.e. land areas that are not inland water or urban. The fraction of glaciers is also output. This is assumed to be a constant field with glacier extents representative of the middle of the full reanalysis period (1997-2021). Glacier extent in remote Arctic locations is not available as accurately mapped yearly datasets. The official maps are outdated due to major calving events in the recent decades. HARMONIE-AROME has not yet been designed to deal with changing land-sea masks or other surface classifications. Thus, these are static fields.

# 6. What are the uncertainties of the data fields?

A user of the reanalysis fields may be interested in knowing the accuracy or uncertainty of the provided data. Uncertainty is an expression of the expected or average deviations from what is considered the "true" value, and several approaches to estimate uncertainties are possible.

Like other reanalysis datasets, this Arctic Regional Reanalysis dataset is not aimed for and is not able to reconstruct the fully true state of the atmosphere. The reanalysis dataset provides a temporally and spatially consistent estimate of the selection of important meteorological parameters constructed from available Earth system observations and the weather forecasting model. As any estimate, the Arctic Regional Reanalysis data are also associated with uncertainty. Quantitative characteristics of reanalysis uncertainty are dependent on many different factors, such as density and quality of observations and characteristics of the assimilation and forecasting system used to determine the observed values to the meteorological parameters on gridded domain.

It should be noted that some reanalysis systems aim at describing how reanalysis uncertainties depend on the weather situation, and for instance in ERA5 this has been done by using a so-called ensemble data assimilation system. (For a description of ensemble data assimilation, see for instance the report on the system developed by ECMWF, https://www.ecmwf.int/en/elibrary/10125-ensemble-data-assimilations-ecmwf .) Unlike ERA5, the CARRA dataset has not been produced with such a system, so we will provide overall "static", not "weather dependent", uncertainties.

To shed light on the uncertainties of CARRA we will present uncertainty information obtained by two different and complementary approaches. First, in sections 6.1-6.4 we present uncertainty estimates at atmospheric levels estimated with a method based on constructing and evolving a time limited ensemble designed to mimic the actual uncertainties, with a subsequent scaling to make the magnitude consistent with actual deviations from observations. An overview of the method and how to use the derived uncertainty numbers is provided in section 6.1, then the actual uncertainty estimates are tabulated at pressure levels for the Eastern domain in section 6.2 and for the Western domain in section 6.3. Please note that the uncertainty estimate values are also available at https://confluence.ecmwf.int/x/p3ltD#CopernicusArcticRegionalReanalysis(CARRA):knownissuesanduncertaintyinformation-Uncertaintyinformation (the data files can be downloaded from there). This method provides uncertainties for a set of main reanalysis variables at pressure levels, see table 6.

Table 6: Variables and levels for which field uncertainty estimates are provided. (Due to technical issues, there is a difference in parameter availability between the East and West domain.)

 Name Levels Pressure Surface as well as mean sea level U-component of wind Pressure levels (50 - 1000 hPa) V-component of windy Pressure levels (50 - 1000 hPa) Temperature Pressure levels (50 - 1000 hPa) Geopotential (East domain only) Pressure levels (50 - 1000 hPa) Relative humidity (East domain)/Specific humidity (West domain) Pressure levels (50 - 1000 hPa)

In section 6.5 we present uncertainties measured as statistics of actual deviations from observations (also known as verification statistics). Note that observations are a reference not identical to the actual truth, as they also will have uncertainties as well as representativeness issues. This verification statistics is provided for a set of near-surface quantities which are covered by the meteorological observation network, including 2m temperature, 10m winds and precipitation.

## 6.1. Field based atmospheric uncertainty estimation

The Earth system observations are unevenly distributed in space and time and the largest part of the Arctic Regional Reanalysis (CARRA) domains has sparse data coverage. Additional information in the form of numerical simulations is needed in order to construct the gridded dataset. Reanalysis employs a data assimilation system for this purpose, the same device that is used to initialize a numerical weather prediction forecasting system with the latest observations. The analysis gridded dataset is obtained as an optimally weighted average of observations and short-range weather forecast (numerical simulations) where weights are dependent on the distance to observations, the forecast error and observation error statistics and the statistical relationships valid between meteorological quantities (model state variables), such as horizontal wind, temperature, specific humidity, surface pressure, geopotential.

Due to affordability reasons, the Arctic Regional Reanalysis employs a robust data assimilation procedure that uses only climatological estimates of the short-range forecast errors statistics. This means that an estimate of the forecast error statistics, such as for example forecast error variance of a certain meteorological quantity, has its meaning on a long time perspective and represents a temporally and spatially averaged value. It is even assumed that the climatological forecast error statistics do not change with geographical location and are not dependent on direction. In other words, the forecast error variance is assumed to be a constant value for the whole domain that changes only with height. The actual forecast error variance of a meteorological quantity taken at the particular time at a specific geographical location may be smaller or larger than its climatological value. The difference between the actual forecast error and its climatological estimate can be large, in particular close to the ground, where orographic conditions may significantly affect the quality of the forecasts. This is a price to pay to obtain a reliable estimate of the forecast error statistics under affordable costs.

We have based the uncertainty estimation we provide below on the same dataset as used to derive the climatological short-range forecast error standard deviation for the data assimilation scheme. This uncertainty is caused by errors in the initial and lateral boundary conditions, and the method provides a climatological upper bound analysis error standard deviation. Under the set of assumptions the assimilation system is defined by, the analysis error variance will be smaller than the observation error variance and also smaller than the first guess forecast error variance used to compute the analysis.

We provide the quantitative measures of uncertainty for meteorological parameters which are directly involved in the analysis procedure, such as horizontal winds, temperature, specific humidity, surface pressure and geopotential. It is not available on all reanalysis output quantities. The measure of uncertainty does not vary horizontally, but it changes vertically as a function of standard pressure levels.

The uncertainty of the CARRA-East dataset varies with the season of the year. We provide summer and winter error standard deviations for meteorological quantities for the CARRA-East domain. We have not noticed such seasonal dependency for the error standard deviations for the CARRA-West domain. The reason is different geographical locations of these domains in relation to the jet stream and associated with its different climatology. For the CARRA-West domain we therefore do not provide seasonal uncertainties.

It should be stressed that the provided uncertainty estimates of CARRA-East and CARRA-West datasets are not necessarily fully consistent and should not be strictly compared between themselves in their magnitudes. CARRA-West and CARRA-East datasets cover different geographical domains and have different observation coverage, so will yield different overall statistics, even if there is an overlapping area. In the overlapping area that covers the Svalbard archipelago, the CARRA-East data and uncertainties should be used. In addition, the climatological upper bound analysis error standard deviations for CARRA-West domain are computed using less data due to technical reasons. This might lead to a somewhat underestimated upper bound analysis error standard deviation.

The method we have used for estimating the uncertainties is described here in details: https://datastore.copernicus-climate.eu/documents/reanalysis-carra/CARRAUncertainty%20estimationFinal.pdf

The methodology does not account for systematic errors or for uncertainties associated with the forecasting model used for numerical simulations, such as for example parameterization of unresolved processes. In addition to the climatological upper bound analysis error standard deviation that we have discussed above, we provide a refined climatological analysis error standard deviation. The refined climatological analysis error standard deviation is the climatological upper bound analysis error standard deviation multiplied by a rescaling factor. The rescaling factor is obtained by confronting the climatological upper bound forecast error variance against observation error variance in observation space. The multiplication by rescaling factor allows compensating for the limitations of the methodology mentioned above to some extent. The refined climatological analysis error standard deviation is a more reliable estimate of the uncertainty of the CARRA datasets than the climatological upper bound analysis error standard deviation in the areas with dense observation coverage. At the same time, the short-range forecast error variance is always lower in the areas covered by observations than in the areas of sparse observation coverage. Thus, the refined climatological analysis error standard deviation can underestimate the actual uncertainty of the Arctic reanalysis data in the areas with poor observation coverage. We propose to use the refined analysis error standard deviation in data rich areas and to use the upper bound standard deviation in data sparse areas as a measure of uncertainty.

## 6.2. Uncertainties for the CARRA-East model domain

In this section we provide the actual estimates of the uncertainty in the form of the "Upper Bound" standard deviation and the "Refined" standard deviation for analysis errors over the CARRA-East domain. We provide both summer and winter statistics. The uncertainty estimation is given both in form of tables for surface pressure and mean sea level pressure (Table 7), u-component of wind (Table 8), v-component of wind (Table 9), temperature (Table 10), geopotential (Table 11) and relative humidity (Table 12) and in form of figures for u-wind component (Figure 2), v-wind component (Figure 3), temperature (Figure 4) geopotential (Figure 5) and relative humidity (Figure 6). All tables contain the name of the meteorological variable, short abbreviation, units and vertical levels. Uncertainty of the surface pressure field is provided at 0 m above ground and at 0 m above mean sea level. The uncertainty of the u- and v- wind components, temperature, geopotential and relative humidity is provided at standard pressure levels: 50hPa, 100hPa, 150 hPa, 200hPa, 250 hPa, 300hPa, 400hPa, 500hPa, 600hPa, 700hPa, 800hPa, 850hPa, 900hPa, 925hPa, 950hPa and 1000hPa. The figures show uncertainty estimates as a function of standard pressure levels. The Upper Bound standard deviation (STDV) of the analysis errors is shown as a solid curve and the Refined Standard Deviation of the analysis error is shown as a dashed curve. Summer statistics are presented in the plots to the left and winter statistics are presented in the plots to the right.

Table 7: Climatological analysis error standard deviation for surface pressure and mean sea level pressure for the CARRA-East domain.

 Name Short name Unit Level Summer statistics Winter statistics Upper Bound STDV Refined STDV Upper Bound STDV Refined STDV Surface pressure sp Pa 0m above ground 37.20 26.78 41.14 38.67 Mean sea level pressure msl Pa 0m above sea level 38.18 27.49 42.15 39.62

Figure 2: Climatological analysis error standard deviation for u-component of wind: summer statistics (left), winter statistics (right) as function of standard vertical pressure levels for the CARRA-East domain.

Figure 3: Climatological analysis error standard deviation for v-component of wind: summer statistics (left), winter statistics (right) as function of standard vertical pressure levels for the CARRA-East domain.

Figure 4: Climatological analysis error standard deviation for temperature: summer statistics (left), winter statistics (right) as function of standard vertical pressure levels for the CARRA-East domain.

Figure 5: Climatological analysis error standard deviation for geopotential: summer statistics (left), winter statistics (right) as function of standard vertical pressure levels for the CARRA-East domain.

Figure 6: Climatological analysis error standard deviation for relative humidity: summer statistics (left), winter statistics (right) as function of standard vertical pressure levels for the CARRA-East domain.

Table 8: Climatological analysis error standard deviation for u wind component for the CARRA-East domain.

 Name Short name Unit Level Summer statistics Winter statistics Upper Bound STDV Refined STDV Upper Bound STDV Refined STDV u-component of wind u m/s 50hPa 0.545 0.223 0.556 0.228 u-component of wind u m/s 100hPa 0.372 0.153 0.397 0.169 u-component of wind u m/s 150hPa 0.584 0.239 0.619 0.254 u-component of wind u m/s 200hPa 0.865 0.355 0.923 0.378 u-component of wind u m/s 250hPa 1.077 0.442 1.089 0.447 u-component of wind u m/s 300hPa 1.290 0.530 1.240 0.508 u-component of wind u m/s 400hPa 1.282 0.526 1.292 0.530 u-component of wind u m/s 500hPa 1.284 0.526 1.364 0.559 u-component of wind u m/s 600hPa 1.322 0.648 1.392 0.738 u-component of wind u m/s 700hPa 1.346 0.740 1.404 0.927 u-component of wind u m/s 800hPa 1.344 0.833 1.324 1.006 u-component of wind u m/s 850hPa 1.331 0.865 1.274 1.045 u-component of wind u m/s 900hPa 1.311 0.905 1.275 1.122 u-component of wind u m/s 925hPa 1.315 0.921 1.295 1.178 u-component of wind u m/s 950hPa 1.330 0.958 1.305 1.227 u-component of wind u m/s 1000hPa 1.302 0.937 1.172 1.102

Table 9: Climatological analysis error standard deviation for v wind component, summer period for the CARRA-East domain.

 Name Short name Unit Level Summer statistics Winter statistics Upper Bound STDV Refined STDV Upper Bound STDV Refined STDV v-component of wind v m/s 50hPa 0.541 0.222 0.541 0.222 v-component of wind v m/s 100hPa 0.376 0.154 0.384 0.157 v-component of wind v m/s 150hPa 0.597 0.245 0.599 0.246 v-component of wind v m/s 200hPa 0.884 0.362 0.898 0.368 v-component of wind v m/s 250hPa 1.105 0.453 1.061 0.435 v-component of wind v m/s 300hPa 1.337 0.548 1.209 0.496 v-component of wind v m/s 400hPa 1.313 0.538 1.270 0.521 v-component of wind v m/s 500hPa 1.328 0.544 1.351 0.554 v-component of wind v m/s 600hPa 1.363 0.668 1.391 0.737 v-component of wind v m/s 700hPa 1.383 0.761 1.402 0.925 v-component of wind v m/s 800hPa 1.382 0.857 1.332 1.012 v-component of wind v m/s 850hPa 1.363 0.886 1.282 1.051 v-component of wind v m/s 900hPa 1.342 0.926 1.277 1.124 v-component of wind v m/s 925hPa 1.344 0.941 1.293 1.177 v-component of wind v m/s 950hPa 1.361 0.980 1.299 1.221 v-component of wind v m/s 1000hPa 1.318 0.949 1.167 1.097

Table 10: Climatological analysis error standard deviation temperature for the CARRA-East domain.

 Name Short name Unit Level Summer statistics Winter statistics Upper Bound STDV Refined STDV Upper Bound STDV Refined STDV temperature t K 50hPa 0.141 0.058 0.142 0.058 temperature t K 100hPa 0.122 0.050 0.122 0.050 temperature t K 150hPa 0.230 0.094 0.224 0.092 temperature t K 200hPa 0.466 0.191 0.441 0.181 temperature t K 250hPa 0.490 0.201 0.438 0.180 temperature t K 300hPa 0.392 0.161 0.406 0.167 temperature t K 400hPa 0.314 0.129 0.341 0.140 temperature t K 500hPa 0.346 0.142 0.359 0.147 temperature t K 600hPa 0.399 0.196 0.399 0.211 temperature t K 700hPa 0.465 0.256 0.465 0.307 temperature t K 800hPa 0.537 0.333 0.485 0.369 temperature t K 850hPa 0.586 0.381 0.508 0.417 temperature t K 900hPa 0.614 0.424 0.520 0.458 temperature t K 925hPa 0.604 0.423 0.531 0.483 temperature t K 950hPa 0.612 0.441 0.546 0.513 temperature t K 1000hPa 0.582 0.419 0.576 0.504

Table 11: Climatological analysis error standard deviation geopotential for the CARRA-East domain.

 Name Short name Unit Level Summer statistics Winter statistics Upper Bound STDV Refined STDV Upper Bound STDV Refined STDV geopotential z m²/s² 50hPa 42.17 17.29 41.54 17.03 geopotential z m²/s² 100hPa 41.29 16.93 40.39 16.56 geopotential z m²/s² 150hPa 39.06 16.01 38.22 15.67 geopotential z m²/s² 200hPa 30.34 12.44 30.41 12.47 geopotential z m²/s² 250hPa 29.65 12.16 29.87 12.25 geopotential z m²/s² 300hPa 31.28 12.82 30.92 12.68 geopotential z m²/s² 400hPa 29.68 12.17 29.56 12.12 geopotential z m²/s² 500hPa 27.60 11.32 27.56 11.30 geopotential z m²/s² 600hPa 26.57 13.02 26.47 14.03 geopotential z m²/s² 700hPa 25.99 14.29 25.93 17.11 geopotential z m²/s² 800hPa 25.40 15.75 25.48 19.36 geopotential z m²/s² 850hPa 25.13 16.33 25.68 21.06 geopotential z m²/s² 900hPa 25.40 17.53 26.80 23.58 geopotential z m²/s² 925hPa 25.89 18.12 27.77 25.27 geopotential z m²/s² 950hPa 26.74 19.25 29.02 27.28 geopotential z m²/s² 1000hPa 29.47 21.22 32.43 30.48

Table 12: Climatological analysis error standard deviation relative humidity for the CARRA-East domain.

 Name Short name Unit Level Summer statistics Winter statistics Upper Bound STDV Refined STDV Upper Bound STDV Refined STDV relative humidity r % 50hPa 0.024 0.010 1.256 0.515 relative humidity r % 100hPa 0.010 0.004 0.194 0.080 relative humidity r % 150hPa 0.040 0.016 0.425 0.174 relative humidity r % 200hPa 0.534 0.219 1.913 0.784 relative humidity r % 250hPa 3.223 1.321 4.900 2.009 relative humidity r % 300hPa 6.153 2.523 8.861 3.633 relative humidity r % 400hPa 8.236 3.377 12.634 5.180 relative humidity r % 500hPa 7.181 2.944 12.562 5.150 relative humidity r % 600hPa 6.819 3.341 11.824 6.267 relative humidity r % 700hPa 6.740 3.707 11.522 7.605 relative humidity r % 800hPa 6.544 4.057 9.638 7.325 relative humidity r % 850hPa 6.118 3.977 8.194 6.719 relative humidity r % 900hPa 5.215 3.598 6.820 6.002 relative humidity r % 925hPa 4.674 3.272 6.303 5.736 relative humidity r % 950hPa 4.285 3.085 5.767 5.421 relative humidity r % 1000hPa 3.477 2.503 4.277 4.020

## 6.3. Uncertainties for the CARRA-West model domain

In this section we provide the actual estimates of the uncertainty in the form of the Upper Bound Standard Deviation and the Refined Standard Deviation for analysis errors over CARRA-West domain. For this domain, we provide all-year-around averaged statistics. The uncertainty estimation is given both in form of tables for surface pressure and mean sea level pressure (Table 13), u-wind component (Table 14), v-wind component (Table 15), temperature (Table 16) and specific humidity (Table 17), and in form of figures for the u- and v- wind components (Figure 7, left and right panel, respectively), temperature (Figure 8 , left panel) and specific humidity (Figure 8, right panel). All tables contain the name of the meteorological variable, short abbreviation, units and vertical levels. Uncertainty of the surface pressure field is provided at 0 m above ground and at 0 m above mean sea level. The uncertainty of the u- and v- components of wind, temperature and specific humidity is provided at standard pressure levels : 50 hPa, 100 hPa, 150 hPa, 200 hPa, 250 hPa, 300 hPa, 400 hPa, 500 hPa, 600hPa, 700 hPa, 800 hPa, 850 hPa, 900 hPa, 925 hPa, 950 hPa and 1000 hPa. The figures show uncertainty estimates as a function of standard pressure levels. Note that the uncertainty estimates for the CARRA-West domain are not given for exactly the same variables as for the CARRA-East domain due to technical processing challenges. The Upper Bound Standard Deviation of the analysis errors is shown as a solid curve and the Refined standard deviation (STDV) of the analysis error is shown as a dashed curve.

Table 13: Climatological analysis error standard deviation for surface pressure and mean sea level pressure for the CARRA-West domain.

 Name Short name Unit Level Upper Bound STDV Refined STDV Surface pressure sp Pa 0m above ground 26.11 7.04 Mean sea level pressure msl Pa 0m above sea level 39.44 10.64

Figure 7: Climatological analysis error standard deviation for u-component of wind (left plot) and for v-component of wind (right plot) as function of standard vertical pressure levels for the CARRA-West domain.

Table 14: Climatological analysis error standard deviation for u-component of wind for the CARRA-West domain.

 Name Short name Unit Level Upper Bound STDV Refined STDV u-component of wind u m/s 50hPa 0.540 0.146 u-component of wind u m/s 100hPa 0.370 0.100 u-component of wind u m/s 150hPa 0.447 0.121 u-component of wind u m/s 200hPa 0.611 0.165 u-component of wind u m/s 250hPa 0.732 0.198 u-component of wind u m/s 300hPa 0.871 0.235 u-component of wind u m/s 400hPa 0.869 0.235 u-component of wind u m/s 500hPa 0.862 0.233 u-component of wind u m/s 600hPa 0.887 0.240 u-component of wind u m/s 700hPa 0.893 0.241 u-component of wind u m/s 800hPa 0.856 0.231 u-component of wind u m/s 850hPa 0.831 0.224 u-component of wind u m/s 900hPa 0.810 0.219 u-component of wind u m/s 925hPa 0.803 0.217 u-component of wind u m/s 950hPa 0.799 0.216 u-component of wind u m/s 1000hPa 0.773 0.209

Table 15: Climatological analysis error standard deviation for v-component of wind for the CARRA-West domain.

 Name Short name Unit Level Upper Bound STDV Refined STDV v-component of wind v m/s 50hPa 0.535 0.144 v-component of wind v m/s 100hPa 0.360 0.097 v-component of wind v m/s 150hPa 0.449 0.135 v-component of wind v m/s 200hPa 0.603 0.162 v-component of wind v m/s 250hPa 0.714 0.193 v-component of wind v m/s 300hPa 0.851 0.230 v-component of wind v m/s 400hPa 0.853 0.230 v-component of wind v m/s 500hPa 0.852 0.230 v-component of wind v m/s 600hPa 0.877 0.237 v-component of wind v m/s 700hPa 0.884 0.239 v-component of wind v m/s 800hPa 0.852 0.230 v-component of wind v m/s 850hPa 0.826 0.223 v-component of wind v m/s 900hPa 0.801 0.216 v-component of wind v m/s 925hPa 0.794 0.214 v-component of wind v m/s 950hPa 0.792 0.214 v-component of wind v m/s 1000hPa 0.763 0.206

Figure 8: Climatological analysis error standard deviation for temperature (left plot) and specific humidity (right plot) as a function of standard pressure levels for the CARRA-West domain.

Table 16: Climatological analysis error standard deviation for temperature for the CARRA-West domain.

 Name Short name Unit Level Upper Bound STDV Refined STDV temperature t K 50hPa 0.130 0.035 temperature t K 100hPa 0.112 0.030 temperature t K 150hPa 0.169 0.045 temperature t K 200hPa 0.276 0.075 temperature t K 250hPa 0.285 0.077 temperature t K 300hPa 0.263 0.071 temperature t K 400hPa 0.227 0.061 temperature t K 500hPa 0.235 0.063 temperature t K 600hPa 0.259 0.070 temperature t K 700hPa 0.350 0.095 temperature t K 800hPa 0.473 0.128 temperature t K 850hPa 0.500 0.135 temperature t K 900hPa 0.508 0.137 temperature t K 925hPa 0.509 0.137 temperature t K 950hPa 0.520 0.140 temperature t K 1000hPa 0.542 0.146

Table 17: Climatological analysis error standard deviation for specific humidity for the CARRA-West domain.

 Name Short name Unit Level Upper Bound STDV Refined STDV specific humidity r g/kg 50hPa 0. 0. specific humidity r g/kg 100hPa 0. 0. specific humidity r g/kg 150hPa 0. 0. specific humidity r g/kg 200hPa 0.001 0.0002 specific humidity r g/kg 250hPa 0.002 0.0004 specific humidity r g/kg 300hPa 0.006 0.001 specific humidity r g/kg 400hPa 0.020 0.005 specific humidity r g/kg 500hPa 0.048 0.013 specific humidity r g/kg 600hPa 0.082 0.022 specific humidity r g/kg 700hPa 0.113 0.031 specific humidity r g/kg 800hPa 0.127 0.034 specific humidity r g/kg 850hPa 0.126 0.034 specific humidity r g/kg 900hPa 0.119 0.032 specific humidity r g/kg 925hPa 0.115 0.031 specific humidity r g/kg 950hPa 0.112 0.030 specific humidity r g/kg 1000hPa 0.099 0.027

## 6.4. Questions & Answers on field uncertainty estimates

For information and "questions and answers" (Q&A) on the uncertainty estimates for the ERA5 host reanalysis used on the lateral boundaries of this Arctic Regional Reanalysis, see this link:
ERA5: uncertainty estimation

The field uncertainty estimates provided here apply a different approach. The below Q&A is similar to the Q&A for the global reanalysis, adapted to apply for the uncertainty estimates provided here (for the Arctic reanalysis).

(1) What does the uncertainty mean? Is it the real error of the reanalysis?
The real error of the reanalysis at any given time and place is an unknown quantity. (If we knew the exact error, we could correct for it and make an exact reanalysis.) The uncertainty estimates provided here are estimates of the error averaged over many cases. It is expressed as the standard deviation of errors. It provides a typical value of the error, but in a statistical sense. This means that in practical cases the error may be smaller or larger than this statistical uncertainty estimate.
The method used for estimating the uncertainties is based on an ensemble of many realizations of the reanalysis done over limited time periods only. For the uncertainty estimation, it is assumed that the spread of this ensemble has similar properties as the spread of the real errors of the reanalysis. It neglects the variation of the error with the weather situation of each particular day, and also neglects the horizontal variation of the error within the reanalysis domain.
We provide both what we call a "Refined" and an "Upper Bound" error standard deviation estimate. The "Refined" climatological analysis error standard deviation is a more reliable estimate of the uncertainty of the CARRA data sets than the climatological "Upper Bound" analysis error standard deviation in the areas with dense observation coverage. At the same time, the short-range forecast error variance is always lower in the areas covered by observations than in the areas of sparse observation coverage. Thus, the "Refined" climatological analysis error standard deviation can underestimate the actual uncertainty of the Arctic Reanalysis data set in the areas with poor observation coverage. We propose to use the "Refined" analysis error standard deviation in data rich areas and to use the "Upper Bound" standard deviation in data sparse areas as a measure of uncertainty.

(2) Can I take the numbers for uncertainty at face values?
No, do not take the uncertainty values at face value, though they are valuable to provide approximate magnitudes of the uncertainties and representative values for different seasons (see question 5 below), as well as some of the differences between the two reanalysis domains. The difference between the actual forecast error and the "climatological" uncertainty estimates can be large, in particular closer to the ground, where orographic conditions may significantly impact the quality of the forecasts.

(3) How is the uncertainty estimate obtained? Which sources of uncertainty does it account for and which does it omit?
The uncertainty estimation for CARRA is obtained from ensembles used in the derivation of the error statistics for the data assimilation schemes. See https://datastore.copernicus-climate.eu/documents/reanalysis-carra/CARRAUncertainty%20estimationFinal.pdf for details on how this is computed. It addresses some of the uncertainties of the model and data assimilation system, but not everything. It is based on using the weather forecast model to evolve initial uncertainties, represented as scaled random perturbations, forward in time and also using perturbations in the lateral boundary conditions. With this method, some of the systematic errors in the weather forecast model will not be accounted for.

(4) Does the uncertainty also account for systematic errors in CARRA, or only for random errors?
The uncertainty estimates account for random errors and not for systematic ones. Only the random errors are accounted for in the perturbations of the model used in the estimation method. Therefore, one limitation of the uncertainty estimation is that systematic errors are not addressed.

(5) Where can I find monthly-mean values for uncertainty?
For the CARRA-East domain, there are typical winter (January) and summer (July) uncertainty estimates provided. There are no monthly values for other months provided for the uncertainties. Users can expect that values for the intermediate months will lie between these values. There is no general recipe for deriving monthly estimates for each month, but common sense should prevail and of course, the specific user need should be applied. One solution is to do a temporal interpolation between the winter and summer uncertainty values, which at least would provide a better approximation for the actual month, although the method has its limitations.
For the larger CARRA-West domain we have not noticed such seasonal dependency for the error standard deviation. The reason is different geographical locations of these domains in relation to the jet-stream and associated with its different climatology. Therefore it is proposed to use this single value for the entire year. Still, it must be remembered that the uncertainties represent climatological overall values and will vary depending on the weather situation.

(6) Where is CARRA more accurate and where less, and how does the uncertainty evolve over time?
The profile figures provided above provide uncertainty indications for winter and summer values (for the CARRA-East domain) as well as indicating the differences between the two reanalysis domains. The estimates we provided are considered as overall estimates representing an average over the 24 years reanalysis period. It will not capture any evolution in time of the uncertainties throughout the period. We do expect a slight decrease of the uncertainties in time throughout the reanalysis period, so that recent periods have slightly smaller uncertainties than older periods.

(7) Is there a forecast evolution of the uncertainty indicators made available (standard deviations of errors)?
The uncertainty estimates provided represent the uncertainty of the analysis time. We also output longer forecast ranges from the reanalysis, where the uncertainties are higher, but no uncertainty estimates are attached to them.

(8) I want to use data from an area where the East and West domain overlap. Which domain will provide the most accurate data?
The provided East and West uncertainty values give some indication on the difference in accuracy between the East and West domain, but can not be taken at face value in the overlapping area, since they represent the overall horizontal averages over the full domains. For the Svalbard area, we recommend to use the East domain. Svalbard is very close to the corner of the West domain, and we consider it better positioned in relation to the East domain boundaries.

(9) What are the uncertainties of the reanalysis output parameters not provided estimates for here?
The method and data we have had available for uncertainty estimation is only for a limited set of the output parameters available from the regional reanalysis. For some of the other variables verification statistics from comparison of reanalysis with observations, provided below, provides information on the uncertainties. This includes commonly used parameters such as 2-metre temperatures, 10-metre winds and precipitation.
For the remaining output parameters, we do not have uncertainty estimates from this methodology. Here uncertainties need to be allowed for in a combination with common sense when using the data.

(10) How do the uncertainties in CARRA compare to the uncertainties in ERA5? Can we use ERA5 uncertainties for variables, where such estimate is not provided for CARRA?
Uncertainty information from ERA5 has some relevance for CARRA. ERA5 data is used on the lateral boundaries of CARRA, and an algorithm "mixes in" some of the large spatial scales from ERA5 in CARRA, so information and inaccuracies from ERA5 will be inherited by CARRA. Still, CARRA adds value and can improve accuracy and correct errors by employing higher resolution, more local observations and describing several surface aspects in more detail. ERA5 provides extended uncertainty information relative to CARRA by (1) providing it for more physical quantities than CARRA, and (2) also provide uncertainty estimates which vary with time, space and weather situation rather than overall average uncertainties for each vertical level. Overall, verification statistics indicates (see section 6.5) that at least for the near-surface variables of the reanalysis, CARRA will be more accurate than ERA5.

For advanced users needing more detailed uncertainty information, uncertainties from ERA5 for the actual variable, time and position, might provide an additional indication (upper bound as concerns the near-surface variables) of the actual uncertainty of CARRA.

## 6.5. Uncertainties relative to observations: Verification statistics

Any observation or analysis of a meteorological variable is uncertain. For an observation, the uncertainty may be assessed by comparing measurements performed with similar instruments. In average, these instruments should observe the same values, but they will not get exactly the same results. Often the distribution of such random and independent deviations relative to the average measurement is a Gaussian distribution. This is defined by having a mean value 𝛍 and a standard deviation 𝛔 (STDV) as shown in Figure 9. The numbers on the distribution shown here are arbitrary and are just meant as an illustration.

Figure 9: An illustration of a Gaussian distribution. Here 𝛍 = 0 is the mean value of the distribution, 𝛔 = 2 is the standard deviation, and the distribution is normalized to have an amplitude of 1.0.

When the reanalysis data are compared to measurements, it should be remembered that both the reanalysis data and the measurements have uncertainties, e.g. 𝛍reanalysis and 𝛔reanalysis and 𝛍measurement and 𝛔measurement. The difference between these two distributions is what is estimated in this section. If they have Gaussian distributions the difference between them also has a Gaussian distribution with a mean value of 𝛍reanalysis - 𝛍measurement, which is called the bias, and a Standard Deviation Error SDE that is

$$\sqrt{\sigma_{reanalysis}^2 + \sigma_{measurements}^2}$$

Ideally, the bias should be 0.0 and the SDE should be equal to the STDV of the measurements 𝛔measurement. The Root Mean Square Error (RMSE) is related to the bias and the SDE by being the square root of the square sum of these two terms. Notice that if a large dataset that cannot become negative are considered, for instance precipitation, the distributions of these will be log-normal rather than Gaussian.

To provide additional information on the quality of near-surface parameters in CARRA, a comparison with point observations on ground has been performed. This provides statistical verification information which supplements the uncertainty estimates described above, but focuses exclusively on observation locations.
In Figure 10 the regional domains of CARRA are shown together with available observation sites in the CARRA system (not all sites observe all parameters or during the whole CARRA period). The density of the observations is important for two reasons. A dense observation network provides more robust and representative verification results, e.g. for northern Scandinavia, the Norwegian coast and parts of Iceland. On the other hand, less robust verification results are obtained for Greenland and Svalbard. It should also be noted that reliable precipitation observations from Greenland are lacking and are few over Svalbard, making these results even less representative for the entire region.

Figure 10: The two CARRA domains (black frames) with observation sites (red dots) available for the entire or parts of the CARRA period. The observation sites may observe different sets of parameters.

Secondly, the surface assimilation process is an important factor for the quality of near-surface parameters (e.g. 2-metre temperature and humidity), i.e. regions with a high-density surface observation network might show higher quality than similar regions with less dense network. The corrections in the surface analysis (analysis increment) for temperature are reduced to 2/3 of its value 70 km away from the observation site. A rough estimate shows that less than 20% of Greenland, 50% of Svalbard and 65-70% of Iceland and Scandinavia are closer to an observation site than 70 km. Similarly, the analysis increment is reduced to 1/3, 120 km away from the observations. This means that the influence from observations is small at about 65% of Greenland and 20% of Svalbard, Iceland and Scandinavia.

Some of the verified parameters are used in the CARRA production assimilation process for upper-air (pressure) and surface (2-metre air temperature and relative humidity), while others are not used (10-metre wind speed over land and precipitation). The presented verification does therefore not always apply independent observations and can be looked at as an upper-bound of the quality of CARRA in observation dense regions, while in regions far away from observations a lower quality of the CARRA surface fields can be expected. The verification is nor exhaustive as the characterization of all aspects of analysis/forecast errors is very complex. However, the verification provides an overview of the known quality of the reanalysis at the time of the production. Users of the CARRA dataset are encouraged to perform their own verification, tailored towards their own use, and if possible with respect to independent observations.

In the interpretation of the verification results the difference between the gridded reanalysis and observations can be explained by a combination of 1) weaknesses in the reanalysis, 2) observation representativeness issues and 3) observation errors. Weaknesses in the reanalysis is what we want to reveal in the verification process. Observation representativeness issues are related to the different scales of point observations and gridded data (e.g. how observations may vary within a grid cell). For parameters rapidly varying in space or time the representativeness component may be substantial. The importance of the representativeness errors depends on the application of the data, e.g. it is important when you seek the most accurate information of the weather at a given time and location, while not necessarily when investigating temporal trends. The observation representativeness error is therefore also larger for coarser resolution dataset (e.g. global reanalysis) than for high-resolution dataset like CARRA. To reduce the impact of observation errors a quality control of the observations is done before the verification is performed. In general, it is believed that the difference between reanalysis/forecasts and point observations are larger in polar regions than at lower latitudes with contributions from all three components mentioned above.

The evolution of the estimated Root Mean Square Error (RMSE) in Mean Sea Level Pressure (MSLP) for CARRA and ERA5 from 1997 to 2018 is compared in Figure 11a. CARRA has clearly better (lower) RMSE, but the difference is most pronounced in the first part of the period. Substantial annual cycles with larger errors in winter and smaller errors in summer are noticed for both reanalysis domains. In addition to inter-annual variations, a tendency towards lower RMSE in general in the second part of the period is seen. Regional differences in the estimated bias and the estimated Standard Deviation of the Error (SDE) as indicators of systematic and non-systematic errors in CARRA, respectively, are shown in Figure 11b. The smallest (largest) SDE is found at the Norwegian (Greenland) coast. The biases are on average relatively modest in all regions, but a large site-to-site variability is seen in many regions with the exception of the Norwegian coast and Svalbard. The contributions from observation representativeness and observation errors are not quantified, but are believed to be relatively modest for MSLP. However, the reduction of pressure to sea level at higher elevations in cold environments is a well-known issue and introduces uncertainty in the verification results.

The evolution of the RMSE in 2-metre air temperature (T2m) for CARRA and ERA5 is shown in Figure 12a. CARRA has a clearly better (lower) RMSE than ERA5 (approximately reduced to the half). Again a substantial annual cycle (larger errors in winter), inter-annual variability and site-to-site variability (thin lines) are seen. Regional differences in T2m bias and SDE for CARRA is shown in Figure 12b. While Svalbard (Norwegian coast) has the lowest SDE in summer (winter), the largest SDE is found at the Greenland coast (Scandinavian inland) in summer (winter). The biases are in general slightly positive in winter and more neutral in summer. However, there is a large inter-site variability in the biases in each region. It is believed that the observation representativeness error is substantial for T2m for both CARRA and ERA5, but even more pronounced for the latter due to the coarser resolution.
The evolution of the RMSE in the 10-metre wind speed (WS10) from CARRA and ERA5 is shown in Figure 13a. Again, CARRA adds value to ERA5 in terms of better (lower) RMSE. Also for WS10 a substantial annual cycle (larger errors in winter), some inter-annual and inter-site variability is found in RMSE. Furthermore, the regional differences in SDE are large with less errors in the Scandinavian inland (less windy region) and the largest errors at the Greenland coast and Svalbard (Figure 13b). On average, a positive wind speed bias is seen in winter in many regions and a more neutral bias in summer. However, there is a large inter-site variability in the wind speed bias within each region. Also for WS10, it is expected that the representativeness error gives a substantial contribution to the total difference between CARRA/ERA5 and the observations, but again more pronounced for ERA5 due to the coarser resolution.
The evolution of the RMSE in the 2-metre relative humidity (RH2m) for CARRA and ERA5 is compared in Figure 14a. A very clear reduction in RMSE for CARRA compared to ERA5 is seen (reduced to approximately 50%). Some annual, inter-annual and inter-site variability is also seen. Also for RH2m there are regional differences (Figure 14b) with the smallest SDE found for Scandinavian inland in winter and Svalbard in summer, while the highest SDE is found for the Greenland and Iceland coast. In particular, humidity is a challenging parameter to measure accurately in cold environments, and the interpretation of the verification results for the relative humidity should therefore be done with care.

Figure 11: a) Time series of the estimated root mean square error (RMSE) for the mean sea level pressure (MSLP) from CARRA (red) and ERA5 (blue). Thick lines indicate average over all observation sites, thin lines show individual observation sites. Only observation sites with measurement done in more than 95% of the total period are included. b) How MSLP monthly systematic (bias) and non-systematic (SDE) errors in the CARRA analysis vary between observation sites and during the CARRA period in different regions and seasons. The boxes span the 25-75%-tile of the monthly errors, while the lines span the 5-95%-tile of the monthly errors for CARRA. All available observation sites are included.

Figure 12: a) Time series of the estimated root mean square error (RMSE) for the 2-metre temperature from CARRA (red) and ERA5 (blue). Thick lines indicate average over all observation sites, thin lines show individual observation sites. Only observation sites with measurement done in more than 95% of the total period are included. b) How 2-metre temperature monthly systematic (bias) and non-systematic (SDE) errors in the CARRA analysis vary between observation sites and during the CARRA period in different regions and seasons. The boxes span the 25-75%-tile of the monthly errors, while the lines span the 5-95%-tile of the monthly errors. All available observation sites are included.

Figure 13: a) Time series of the estimated root mean square error (RMSE) for the 10-metre wind speed from CARRA (red) and ERA5 (blue). Thick lines indicate average over all observation sites, thin lines show individual observation sites. Only observation sites with measurement done in more than 95% of the total period are included. b) How 10-metre wind speed monthly systematic (bias) and non-systematic (SDE) errors in the CARRA analysis vary between observation sites and during the CARRA period in different regions and seasons. The boxes span the 25-75%-tile of the monthly errors, while the lines span the 5-95%-tile of the monthly errors. All available observation sites are included.

Figure 14: a) Time series of the estimated root mean square error (RMSE) for the 2-metre relative humidity from CARRA (red) and ERA5 (blue). Thick lines indicate average over all observation sites, thin lines show individual observation sites. Only observation sites with measurement done in more than 95% of the total period are included. b) How 2-metre relative humidity monthly systematic (bias) and non-systematic (SDE) errors in the CARRA analysis vary between observation sites and during the CARRA period in different regions and seasons. The boxes span the 25-75%-tile of the monthly errors, while the lines span the 5-95%-tile of the monthly errors. All available observation sites are included.

Verification of precipitation forecasts is a difficult task in general, due to large representativeness errors associated with small scale variability in precipitation and observation errors. For the latter, the wind induced under-catch of solid precipitation in measurements (i.e. observations measure only a fraction of the true total precipitation) is of particular importance, while the problem is smaller for liquid precipitation. A thorough investigation of the CARRA precipitation (Arctic areas with higher relative occurrence of solid precipitation) therefore requires substantial efforts and more considerations than for other parameters and are therefore not yet conducted. However, some preliminary insight in the performance of daily CARRA precipitation can be achieved from Figure 15.

Figure 15: The estimated relative bias ((CARRA – observed) / observed) of daily precipitation versus the estimated SDE normalized with observed precipitation for different regions for liquid (red) and solid (blue) precipitation. The units are kg/m² which is the same as mm water equivalent.

For liquid precipitation a moderate, mostly positive, bias is found for all regions (CARRA precipitation within 20% of observed values). However, the standard deviation of the error varies between regions with the largest values at Svalbard and smallest errors in Iceland. For solid precipitation, large positive biases are found for Iceland and Svalbard, windy regions where we expect the wind-induced under-catch of solid precipitation in the observations to be substantial. This means that the observations underestimate the true precipitation and these biases are therefore artificially high, and are in reality lower, but the true biases are difficult to quantify. A modest positive bias over the Scandinavian inland areas will also be reduced (or changed to an underestimation), but to a smaller degree since this is a less wind exposed region. All regions exhibit a larger SDE for solid than liquid precipitation and observation errors can be a part of the explanation for this. The only exception is the Scandinavian inland areas, for which the SDE is higher for liquid than solid precipitation. This is a region with substantial convective activity in summer which is less predictable, and where other verification methodologies are needed than point verification. It should be noted that at the Norwegian coast, the CARRA precipitation is smaller than in the observations for both liquid and solid precipitation, which after considering the wind-induced under-catch of solid precipitation indicates a substantial underestimation of the winter precipitation amounts along the Norwegian coast.

The quantile-quantile plots in Figures 16 and 17 show that the climatological distribution of daily precipitation in each region is reasonably well captured by CARRA compared to the observations. The similarities are better for liquid (Figure 16) than solid precipitation (Figure 17). However, CARRA underestimates the 5% highest precipitation values for Svalbard and the Norwegian coast. For solid precipitation, there is a larger difference between the observed and the CARRA precipitation climatology, but at least parts (e.g. Iceland with more precipitation in CARRA than in the observations) of these differences can be attributed to observation errors. However, as for liquid precipitation, the highest precipitation amounts at Svalbard and the Norwegian coast seems to be underestimated in CARRA.

Figure 16: Quantile-quantile plot of CARRA daily liquid precipitation versus observations for different regions. The quantile of the observed precipitation is plotted against the same quantiles from CARRA as dots and connected with a line in the same color. The line is solid below the 99.5% quantile, while it is dashed above the 99.5 quantile. The precipitation phase is estimated depending on the observed temperature (mean of temperature at the start and end of the accumulation period) is above (liquid) or below (solid) +1℃.

Figure 17: Quantile-quantile plot of CARRA daily solid precipitation versus observations for different regions. The quantile of the observed precipitation is plotted against the same quantiles from CARRA as dots and connected with a line in the same color. The line is solid below the 99.5% quantile, while it is dashed above the 99.5 quantile. The precipitation phase is roughly estimated depending on the observed temperature (mean of temperature at the start and end of the accumulation period) is above (liquid) or below (solid) +1℃.

For further details see the test and verification report for CARRA: https://datastore.copernicus-climate.eu/documents/reanalysis-carra/CARRATestVerificationFinal.pdf and the full system documentation: https://datastore.copernicus-climate.eu/documents/reanalysis-carra/CARRAFullSystemDocumentationFinal.pdf

# 7. Known Issues

Since the regional reanalysis is run nested into the ERA5 global reanalysis, it is affected by the known issues of ERA5. In addition to those issues, we have found that ERA5 uses incorrect glacier masks for most of the glaciers in the regional Arctic reanalysis domain, and the glaciers in ERA5 always have an analysis albedo of 0.85. This is wrong, since for instance exposed glacier ice albedos during summer are unaccounted for. These areas affect the general circulation and thermodynamic state in ERA5 and can affect the quality of the Arctic regional reanalysis.
Additionally, the Arctic reanalysis has the following known issues:

# 8. Annex: Brief outline of the Arctic reanalysis system

The details about the Numerical Weather Prediction model system used can be found in the full system documentation: https://datastore.copernicus-climate.eu/documents/reanalysis-carra/CARRAFullSystemDocumentationFinal.pdf

To supplement this we provide in the following a short description of the service as well as a description of basic principles of data assimilation in numerical weather prediction aimed at a non-specialist audience.

## 8.1. The service

The C3S_322_Lot2 contract of the Copernicus Climate Change Service produces and delivers a regional reanalysis (RRA) for the Arctic including long-term datasets of Essential Climate Variables (ECVs) for the 24 year period from 1997 to 2021. The model domains are shown in Figure 1. The modelling system has a resolution of 2.5x2.5 km2 and 65 vertical levels in the atmosphere. The produced datasets are freely available and can be used by anyone who wishes detailed long-term atmospheric data, for instance to study typical ranges of meteorological variables, as a reference for climate model runs, or to investigate highly resolved changes in the ECVs during the reanalysis period. An extended model version for the entire Arctic (pan-Arctic) area is run for a period of 1 year as a proof-of-concept.
The reanalysis model is the weather forecasting model HARMONIE-AROME cy40h1.1.1, which has been enhanced with more Arctic input data, and more extensive surface and atmospheric data assimilation. Data assimilation is explained further in section 7.2. Additionally, the model formulation has been improved with a specific focus on processes essential in the Arctic.

## 8.2. Principles of reanalysis systems

Atmospheric reanalysis is a method to reconstruct the atmospheric states by using historical observations (in situ, surface and satellite remote sensing) together with a weather forecasting model. It mainly provides a physically and dynamically coherent description of the state of the atmosphere. Surface variables are included mainly to the extent that they affect the atmosphere. This synthesis is accomplished by assimilating the observational data into a meteorological model and thereby forcing the model to reproduce the observations as closely as possible and make a gridded dataset that covers also locations from which no observations are available. The advantage of a reanalysis is that it provides a spatially and temporally complete, and consistent record of the atmospheric state. This cannot be achieved with an observational dataset alone, since interpolation methods cannot adequately represent the non-linear variability of the atmosphere.
The main advantages of reanalysis dataset are:

• They provide regularly gridded data, even in places where there are no or few observations;
• They provide a coherent, complete set of variables describing the atmospheric state;
• They provide a reconstruction of the record of past weather since it is constrained by observations.

A reanalysis system cannot perfectly recreate all variables, which are very variable in space and time. In particular, precipitation is a challenge. For specific applications, e.g. in hydrology, it is therefore quite common to correct local precipitation data for biases. Other variables, like surface temperature, are generally less variable in space and time and easier to reconstruct by the reanalysis system. Even with 2.5 km horizontal resolution, the results in complex terrain, such as mountainous regions or coastal areas that have considerable sub-grid variability have limitations compared to results over terrains that are more homogeneous. To accurately represent extreme local wind phenomena such as Greenlandic piteraqs sub-kilometer grid resolution is required.
Weather forecasting is based on an analysis of the current state of the atmosphere and the surface of land and sea. The forecasts are made with mathematical and physical computer models starting from the analysis. Diagnostic model output is a 0 hour forecast from after the model has run for one time step. The temperatures, winds, pressure, moisture, cloud contents and other variables are mapped at regular points in space and time as illustrated in Figure 18.

Figure 18: Schematic representation of the HARMONIE-AROME grid for model variables (surface pressure, temperature, u and v wind components and geopotential (Z), energy and specific moisture content (q)). Energy is considered per area on horizontal surfaces and is obtained from the time-integrated fluxes that are computed on each model time step. In addition to these variables HARMONIE-AROME includes the variables cloud liquid water, cloud ice, rain, snow, graupel and turbulent kinetic energy at the same vertices for which q is computed.

Reanalysis uses a weather forecasting model to create a 'first guess' of the atmospheric state at a certain time. The first guess is then corrected based on observations. This corrective step, referred to as 'data assimilation' (see Figure 19), requires statistical knowledge of the forecast error and the observation error. The procedure also uses physical and statistical relationships of the atmosphere when interpreting the observational data. The result of the data assimilation is called the analysis. The data analysis is run in 3-hourly cycles and hourly forecast data is computed. Thus, the analyses will contain a complete set of values describing the evolution of the atmosphere and the surface over time, also for locations where there are no observations.

This complete estimate of the atmospheric state over time can be of great value to users, for example in assessing the impacts of past weather and climate related events, for statistics of the climate in a location or an area or for running other fine scale models or validating climate models.

An important difference between reanalyses and archived weather analyses from operational forecasting systems is that a reanalysis is produced with a single and improved version of a data assimilation system – including the forecast model used – and is therefore not affected by changes in method. There is also more time to collect observations since the reanalysis does not have the constraint of issuing timely forecasts, and this will improve the quality of the reanalysis. Additionally, re-processed observations can be also used. Reanalysis systems differ in the set of observations that are assimilated, the model that is used, and possibly the way the error statistics are estimated and corrections are applied.

Figure 19: Schematic figure showing the simulation of the atmospheric state (black line) in the reanalysis, which starts from the analysis (green dots) and resulting in the background (blue dots). Note that the data assimilation is run in 3-hourly cycles, and that the background usually does not coincide with the true observed state of the atmosphere.

The Arctic reanalysis system applies the so-called 3D variational data assimilation (3D-VAR) reanalysis method. The 3D-VAR method is depicted schematically in Figure 19. At fixed points in time the model state is adjusted based on the observed state, taking into account the statistics of model and observation errors. The Arctic reanalysis system is run with eight cycles per day performing analyses at 00 UTC, 03 UTC, 06 UTC, 09 UTC, 12 UTC, 15 UTC, 18 UTC and 21 UTC. The forecast lengths vary between 3 and 30 hours.

• Page:
• Page:
• Page:
• Page:
• Page: