Page History

...

Reanalysis is a method for reconstructing past atmospheric states by integrating historical observations with a weather forecasting model. The Copernicus Arctic Regional Reanalysis (CARRA) is a high-resolution climate data product that assimilates an extensive time series of observations into the HARMONIE model and the 3D-Var data assimilation system to provide the most accurate estimate of the atmospheric state. An important requirement for the CARRA reanalysis is the computation of potential ensemble uncertainties for essential climate variables. Numerical models inherently contain various uncertainties and are often run in ensemble mode to enhance forecast accuracy and evaluate uncertainty. During CARRA1, the CARRA team developed an approach that utilizes a limited number of high-resolution ensembles generated over several short time intervals, in conjunction with the derivation of background error statistics (Bojarova et al., 2020). This methodology offers a relatively straightforward estimation of uncertainty for key prognostic variables, employing a scaling method that compares ensemble spread with observation error variances at observation locations. The uncertainty estimates provided are static; however, they do vary with height, season, and between the CARRA-West and CARRA-East domains. The tables containing the information for what fields uncertainties are provided as well as the data themselves can be obtained from Copernicus Arctic Regional Reanalysis (CARRA): Data User Guide and the Known issues and uncertainty information documentation pages (https://confluence.ecmwf.int/display/CKB/Copernicus + Arctic + Regional + Reanalysis +%28CARRA%29%3A+Data+User+Guide#CopernicusArcticRegionalReanalysis (CARRA): DataUserGuide-Whataretheuncertaintiesofthedatafields?^{Image Removed} and https://confluence.ecmwf.int/display/CKB/Copernicus+Arctic+Regional+Reanalysis+%28CARRA%29%3A+known+issues+and+uncertainty+information#CopernicusArcticRegionalReanalysis(CARRA):knownissuesanduncertaintyinformation-Uncertaintyinformation^{Image Removed}Data User Guide-What are the uncertainties of the data fields? and Copernicus Arctic Regional Reanalysis (CARRA): known issues and uncertainty information-Uncertainty information).

For CARRA2, we face a challenge regarding the requirement for a high-resolution, extended reanalysis dataset in terms of both domain and potentially time range, while simultaneously providing uncertainty information when an ensemble system is computationally not feasible. Our proposed approach considers the experiences gained during CARRA1 and draws inspiration from the generation of time-varying uncertainty information as described in Olesen et al. (2013), where deterministic regional-scale information is supplemented with uncertainty estimation using global ensembles for projecting regional climate change. In this work, we again utilize the ensemble dataset generated in connection with the derivation of background error statistics, which is driven by the ERA5-EDA with ensemble components. This dataset is employed to produce a coincident ensemble in the limited-area model by introducing perturbations using the "BRAND" field perturbation approach (refer to section 2.2.2 of Yang et al., 2021, and the CARRA1 system documentation for details on the BRAND approach). The objective is to establish an empirical relationship, in the form of nonlinear regression or a machine learning approach, to predict the high-resolution regional spread of Essential Climate Variables (ECVs) (a scalar for each ECV) using both the high-resolution deterministic CARRA2 and the low-resolution ERA5 reanalysis EDA components as predictors (the full fields, employing an implicit multivariate approach). By taking the high-resolution CARRA2 spread as a proxy for uncertainty, we will be able to predict the CARRA2 uncertainty using the information available during the reanalysis production, even when a corresponding high-resolution ensemble is not available.The statistical model will be trained on this collected dataset to predict the high-resolution spread in the space of ECVs (as a proxy for uncertainty), which will be used to estimate the reanalysis uncertainty for the entire reanalysis period (including periods outside the time slices used for background error statistics derivation). This method is expected to capture variations in uncertainties with height, the presence of orography, observation network density, and other factors. The method involves computing the uncertainty in model space, where we will first calculate the ensemble uncertainty in terms of standard deviation (SD), and ensemble mean from CARRA2 and ERA5-EDA. This approach also has the potential to provide weather situation-dependent uncertainties and will offer a more detailed description of the actual variations in uncertainty across space and time than was achievable with the method used in CARRA1.

...

To train the machine learning model, one of our main data sources is the ECMWF ERA5 reanalysis (Hersbach et al., 2020). The ERA5 reanalysis datasets are generated by continuously integrating observations using 4D-Var data assimilation with the Integrated Forecasting System (IFS) model cycle CY41R2 on 31 km horizontal resolution and using 137 hybrid sigma/pressure model levels in vertical. The ERA5 dataset includes a ten-member ensemble (EDA) that has a lower spatial and temporal resolution (approximately 60 km horizontally and 3-hour temporally) compared to the original ERA5 product (around 30 km horizontally and 1-hour temporal resolution). This lower resolution analysis dataset is utilized for estimating uncertainty in ERA5. More details on the ERA5-EDA component can be found in ERA5: Data Documentation (accessible via https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+ERA5: data documentation) that offers a comprehensive overview of the various products and lists all available geophysical parameters. The ERA5-EDA can be effectively downloaded directly from MARS or through the Copernicus Climate Data Store (CDS).

...

Figure 4: The ensemble spread of 2-meter temperature (in Kelvin) for the 10 ensemble members of the ERA5-EDA reanalysis data (left panel) and the CARRA2 ensemble members (right panel), valid on 20 January 2022 for all four analysis cycles.

Table 1: List of the input near surface parameters for uncertainty estimation in model space using the machine learning method for both ERA5 and CARRA2 ensembles.

Variables	Level	CARRA2 (2.5 km)	ERA5 (ensemble)
2m Temperature (in Kelvin)	Near Surface	Y =2869; X = 2869	Y = 114; X = 130
Zonal Wind (10 m, u- in m/sec)	Near Surface	Y =2869; X = 2869	Y = 114; X = 130
Meridional Wind (10 m, v- in m/sec)	Near Surface	Y =2869; X = 2869	Y = 114; X = 130
Surface Pressure (Pa)	Surface	Y =2869; X = 2869	Y = 114; X = 130

Table 1 displays the input parameters near the surface for uncertainty estimation in model space using the ML approach for both the ERA5 ensemble and CARRA2 ensemble. Precipitation is excluded as a variable in the diffusion-based ML method because the approach requires gridded spread data among ensemble members. When there is no precipitation or a null value for any ensemble member, the spread becomes excessively high and unrealistic. As a result, the ML model is likely to perform poorly in most cases within the CARRA2 domain.

...

Bojarova, J. et al. (2020). Uncertainty estimation method. C3S deliverable report C3S_D322_Lot2.1.1.2-202002. https://confluence.ecmwf. int/display/CKB/Copernicus + Arctic + Regional +Reanalysis+%28CARRA%29%3A+Uncertainty+estimation+Reanalysis (CARRA): Uncertainty estimation method

Hersbach, H., Bell, B., Berrisford, P., et al. (2020). The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146, 1999–2049.https://doi.org/10.1002/qj.3803

...

Yang, Xiaohua et al. (2020). C3S Arctic regional reanalysis – Full system documentation. C3S deliverable report C3S_D311_Lot2.1.2.2–201910. https://confluence.ecmwf.int/display/CKB/Copernicus + Arctic + Regional +Reanalysis+%28CARRA%29%3A+Full+system+Reanalysis (CARRA): Full system documentation

Appendix 1: Supplementary Figures

...

During the DDPM-ML training process, periodic checkpoints (model states) are saved to monitor model performance and track the gradual improvement in output quality. This strategy helps to identify the optimal training duration while ensuring efficient use of computational resources. At each training step, the weighted mean squared error (MSE) is calculated between the model's predicted noise and the true Gaussian noise added at that step (Figure S4). The MSE curve shows a sharp decrease within the first 1,000 steps, after which the error stabilizes and gradually converges. This behavior indicates that the model effectively learns to minimize noise prediction errors as training progresses. However, to produce high-quality outputs (samples), the model must be trained for up to 20,000 steps. In practice, the most accurate and stable results were typically achieved after around 10,000 steps. This extended training requirement is primarily due to the use of high-resolution, large-domain CARRA2 data, which demand longer training for proper convergence and accurate reconstruction. To preserve model progress, checkpoints are saved at 10,000, 12,000, 14,000 steps, and beyond. Each checkpoint file (e.g., model010000.pt) stores the trained model weights and biases, along with diffusion process hyperparameters used in both the forward and reverse processes. These files ensure reproducibility and allow further fine-tuning or analysis if required.

...

Table 1: List of Python Scripts and Job Files

Category / Folder	File Name	Description / Purpose
Main Scripts	Train_Main.py	Main script for training the diffusion model.
Evaluation Scripts	evaluate.py	Script for evaluating model performance across datasets (for UQ and each variable)
	evaluate_FIELD.py	Field-specific evaluation script, likely used for variable-based assessment (t2m, sp, u10, v10…).
Job Submission Scripts	Run_Training.job	Job submission script for launching training in ATOS.
	Run_evaluation.job	Job submission script for running evaluation tasks.
Source Folder: src_diffusion/	diffusion_dist.py	Handles distributed training setup for parallel computation.
	diffusion_fp16.py	Manages mixed-precision (FP16) computation for efficiency.
	diffusion_gaussian.py	Implements Gaussian diffusion processes and noise modeling.
	diffusion_train.py	Core training logic for the diffusion model.
	image_datasets.py	Dataset loader and pre-processing utilities for image inputs.
	logger.py	Logging utility for training and evaluation progress.
	losses.py	Defines and computes loss functions used during training.
	nn.py	Neural network components and layer definitions.
	resample.py	Implements resampling strategies in the diffusion process.
	respace.py	Defines timestep spacing or schedule adjustment functions.
	unet.py	Contains the UNet model architecture used for diffusion and super-resolution tasks.

The Python script Train_Main.py will be executed via the batch job Run_Training.job to perform the training process. The model configuration parameters are detailed in Table 2. To monitor error statistics, it is necessary to extract values from the log file, which is located in the same directory as the output files.

Table 2: Summary of the model configuration parameters used for diffusion-based training and generation (evaluation).

Parameter	Description	Value / Setting
--diffusion_steps	Number of diffusion and denoising iterations each image undergoes during training.	4000
--image_size	Maximum image dimension used during training.	256
--noise_schedule	Type of noise schedule applied; defines how noise levels change during diffusion. Can be modified during tuning.	linear
--lr	Learning rate used for model optimization.	1e-4
--batch_size	Number of images processed in each training batch.	8
--microbatch	Subdivision of batch for memory efficiency; typically based on available GPU memory.	4
--class_cond	Enables supervised learning by conditioning on class labels.	True
--steps	Total number of training iterations.	20,000
—	Model checkpoint saving frequency.	Every 2000 steps
—	Empirical performance note: model accuracy tends to improve notably after this point (varies with dataset size).	~10000 steps

It is important to highlight that the ERA5-EDA dataset possesses a coarser spatial resolution, characterized by grid dimensions of 130 by 114, whereas the CARRA2 dataset features a significantly finer resolution of (2880 x 2880) grid points. To reconcile these differences and generate outputs at the CARRA2 resolution, a specific approach was integrated within the training loop to facilitate appropriate sampling within the Super-Resolution Model, specifically the U-Net Model. This model encompasses essential components for both the training and deployment of diffusion-based super-resolution frameworks conditioned on low-resolution ERA5-EDA input maps. Key features of the model include advanced sampling techniques tailored for diffusion training, U-Net inspired architectural designs, incorporation of residual and attention mechanisms, cross-attention conditioning, as well as optional mixed-precision training capabilities to enhance computational efficiency.

...

Table 3: The key files generated during model training, including checkpoint files, log outputs, and progress tracking data.

File Name	File Type	Description
log.txt	Log File	Contains training logs, including losses, metrics, and system information.
progress.csv	Progress File	Records performance metrics over training steps for plotting or analysis.
model000000.pt	Model Checkpoint	Initial model state before training begins.
model002000.pt	Model Checkpoint	Saved after 2,000 training steps.
model004000.pt	Model Checkpoint	Saved after 4,000 training steps.
model006000.pt	Model Checkpoint	Saved after 6,000 training steps.
model008000.pt	Model Checkpoint	Saved after 8,000 training steps.
model010000.pt	Model Checkpoint	Saved after 10,000 training steps.
model012000.pt	Model Checkpoint	Saved after 12,000 training steps.
model014000.pt	Model Checkpoint	Saved after 14,000 training steps.
model016000.pt	Model Checkpoint	Saved after 16,000 training steps.
model018000.pt	Model Checkpoint	Saved after 18,000 training steps.
model020000.pt	Model Checkpoint	Final trained model after 20,000 steps.

d. Diffusion Sampling and Evaluation Overview

...

Table 4: UQ Evaluation Output Files

Category	Files
Evaluation Type	Uncertainty Quantification
Logs	rank_0.log, rank_1.log, rank_2.log, rank_3.log
Model Outputs (SD Evaluations)	UQ_ckpt_model012000.pt.png UQ_ckpt_model014000.pt.png UQ_ckpt_model016000.pt.png UQ_ckpt_model018000.pt.png UQ_ckpt_model020000.pt.png
Main Output Image	UQ.png

Table 5. Filed Evaluation Output Files

Category	Files
Evaluation Type	Uncertainty Quantification
Logs	rank_0.log, rank_1.log, rank_2.log, rank_3.log
Model Outputs (SD Evaluations)	FIELD_ckpt_model012000.pt.png FIELD_ckpt_model014000.pt.png FIELD_ckpt_model016000.pt.png FIELD_ckpt_model018000.pt.png FIELD_ckpt_model020000.pt.png
Main Output Image	TARGET_CARRA2.png

Image Modified
Figure 2.a.:Uncertainty estimation associated with higher-resolution uncertainty quantification, as depicted in the file UQ.png. Panel b) presents the corresponding 2-meter temperature (k) field for a single UTC.More information is available in the (C3S2_D361a.1.4.1_UncertaintyEstimation_v1).

In summary, the final result of the higher resolution uncertainty quantification is presented in the file UQ.png. The file TARGET_CARRA2.png serves as a reference for comparison or evaluation against the actual field data.

...

Space shortcuts

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Appendix 1: Supplementary Figures