- Created by Eduardo Penabad, last modified by Christopher Goddard on Feb 15, 2024

This page describes the processing performed in the production of the data and graphical products of the C3S seasonal forecast service, and provides additional definitions where necessary. This complements the details in the corresponding catalogue data entries and chart captions.

### Data productsÂ

All data is received from the contributing centres on a common 1x1 degree grid prescribed by C3S; the regridding from the original model resolution is done by each contributing centre. This does not apply to the data from JMA, which is provided on a 1.25x1.25 degree grid. The original grids of the modelling system components are listed here: Description of the C3S seasonal multi-system.

In general, the common hindcast period, 1993 - 2016, is used as the reference period for C3S data and graphical products, regardless of the hindcast period available for each individual component system (unless stated otherwise).

**Sub-daily/daily data on single or pressure levels**

- Seasonal forecast daily and subdaily data on single levels
- Seasonal forecast subdaily data on pressure levels

This is the data at the original time resolution received from the contributing centres (once a day, or once every 6 hours for single levels, depending on the variable and once every 12 hours for pressure levels), and the data at this resolution is used to derive the monthly statistics.

__Monthly Statistics__

- Seasonal forecast monthly statistics on single levels
- Seasonal forecast monthly statistics on pressure levels

Monthly statistics are made available, in addition to the sub-daily/daily original time resolution. These include

- Ensemble: ensemble mean, hindcast climate mean
- Individual members: monthly mean, monthly minimum, monthly maximum, monthly standard deviation
*(note that this does not apply to all variables in all cases, E.g. no minimum for precipitation, or minimum, maximum and standard deviation for pressure level variables)*

These monthly aggregations are calculated from the original sub-daily/daily data provided, for each calendar month and lead-time in the hindcasts and real-time forecasts. In the case of wind, the speed is first calculated from the instantaneous u and v components of the wind at the original time resolution, then aggregated into monthly statistics. The common hindcast period is used for estimating the hindcast climate mean. The ensemble mean for individual systems is an equal-weights average of all 'qualifying ensemble members' used for the graphical products (not all members are included for lagged start ensembles).

-*qualifying ensemble members (= C3S ensemble of real-time forecasts)*: the subset of ensemble members used in the C3S graphical products; in some cases (lagged-start systems) this is a subset of the total forecast ensemble in the catalogue. The same ensemble members are used for all variables for which anomalies are provided, even though not all variables are part of C3S graphical products. The sub-selection is chosen as a compromise between ensemble size (which should be large) and the start/lead times used in the ensemble (which should not span too long a period). Details on the qualifying members for each origin are available in the documentation.

__Anomalies (of monthly means)__

Monthly anomalies are calculated for real-time forecasts, using the model hindcast mean as a reference. They are calculated separately for each system and each ensemble member, for each nominal start time and lead time, by subtracting the respective hindcast mean from the corresponding real time forecast. The period used as reference for the estimation of the model climatology is the common hindcast period.

The ensemble mean anomaly for individual systems is an equal-weights average of all 'qualifying' ensemble members (explained above).

__Ocean variables__

This data is provided by the contributing centres aggregated to a monthly time resolution.

### Graphical products

Link: https://climate.copernicus.eu/charts/packages/c3s_seasonal/

Graphical products use the monthly anomalies data described above. Products for three-month aggregations - the mean of the valid months selected - are also available. As explained above, for lagged-start systems, a subset of qualifying ensemble members are used in the graphical products.

The forecast diagnostics available for single or multi-system combinations are described below. These are available for 10m wind speed, MSLP, T2M, T850, geopotential height 500hPa and precipitation, unless another variable is stated in the description. Where significance testing is applied, it follows the approach in the section 3.2 of the SEAS5 user guide.

__Multi-model combination - ensemble mean anomaly__

The mean of the multi-system ensemble is a weighted average of the ensemble means of component systems (each calculated as described under ‘Data products’). The weights are applied to ensure all components make 'equal' contributions to the variance of the multi-system over the common hindcast period. For each component model, ensemble mean anomalies are computed with respect to the corresponding model climate as described under 'Data products'. For computation of the multi-model mean, the weight for each of the component of the combination is equal to the square root of the average variance of all systems in the combination divided by the square root of the variance of the respective component. Each 'origin' (=contribution from a provider) is considered a component of the multi-system in this context. This means that the whole contribution from ECCC (a combination of two components) counts as much as the other origins (single components), where the combination follows the same method as the multi-system. The weights to perform this variance standardization are calculated from the common hindcast period.

The single system ensemble mean anomaly charts do not have the weights used for the variance standardization applied, and show the data as described under 'Data products'.

For providers where the data is at a resolution different from the 1x1 degree grid, it is interpolated to the common 1x1 degree grid before the multi-system mean is computed.

**Single and multi-model - **__probabilistic__

The forecast summary options available for the plots, in addition to the single or multi-model ensemble mean anomaly, are:

- Tercile summary
- Prob. for lower tercile category
- Prob. for middle tercile category
- Prob. for upper tercile category
- Prob. for lowest 20%
- Prob. for highest 20%
- Prob. exceeding median

The terciles, lowest and highest 20%, and median values are calculated for each individual system from the hindcast climatology for a given start and lead time (all the qualifying ensemble members over the common hindcast period). Individual system probabilities are then estimated by comparing the forecast probability density function (PDF) from the forecast 'qualifying ensemble members' with these categories, i.e., counting the number of members above (or below) the value at each grid-cell. For the 3-month charts the same calculation is performed, instead using the mean of the specified valid months and lead time in the forecasts and hindcasts, and re-calculating the categories and probabilities. Multi-system probabilities are then calculated as the unweighted mean of the probabilities from the contributing systems, although multi-component contributions (ECCC) are weighted so that the overall contribution counts as much as the others.

Different resolutions in the multi-system combinations are dealt with as described above.

**SST indices - single and multi-model**

Time-series charts of multi-system combination, or single system, SST-index forecasts. These are area-averaged monthly-mean sea-surface temperature anomalies computed over specified regions of the tropical Pacific and the Indian Ocean: NINO 1+2(0-10S, 80-90W), 3(5N-5S, 90W-150W), 3.4(5N-5S, 120W-170W), and 4(5N-5S, 160E- 150W), and the Indian Ocean Dipole region, IndOcW (10N-10S,50E-70E) and IndOcE (0-10S, 90E-110E). For the IOD, anomalies with respect to the 1993-2016 climate are shown. For NINO indices, anomalies with respect to the 1981-2010 climate are shown. The anomalies are computed relative to ERA5 climatology, after the start- and lead-time dependant bias in the absolute values has been removed based on the common hindcast period.

The plot types available are:

- Ensemble members
- Probabilities
- Percentiles

A form of variance adjustment, similar to the weighting in the multi-model anomalies, is applied. In this case, for each system, a (capped) multiplication factor is applied to the index time series, which matches the model and observed variance of anomalies. This variance adjustment is based on the common hindcast period (1993-2016, with the exception of NCEP, where 1999-2016 is used for NINO regions). The single-system ensemble member charts show the adjusted anomaly time series for each member, and the multi-system chart simply overlays the members from each system.

Percentile categories are computed for each system during the hindcast period (1993-2016, with the exception of NCEP where 1999-2016 is used for NINO regions). Probabilities for categories bound by percentiles 25 (P25) and 75 (P75) of the climatology are calculated independently for each system. These are averaged to produce the multi-system combination.

For 3, the percentiles shown are computed from the forecast distribution of anomalies for each individual system, which have been re-scaled as described above. A multi-system combination is not computed.

__Single-system U10hPa (stratospheric wind)__

The index is defined as the zonal average of the U (eastward) component of wind at 10 hPa, at 60N, and is computed from the 12 hourly data. Individual system plots are available for each system where U10hPa data is provided. These plots are not available for contributions which comprise multi-systems (ECCC).

The plot types available are:

- Ensemble members
- Probabilities

The ensemble members plot shows the time-evolution of individual qualifying ensemble members and the ensemble mean for the forecast (blue) and the percentiles of the hindcast distribution (shades of orange) from the common hindcast period. The black line represents the observed climate mean derived from ERA5 over the same period. No explicit bias correction is applied.

The probabilities are instantaneous values of the index being below zero, computed by counting the number of qualifying ensemble members in this category at each valid time. The hindcast probabilities shown in orange are computed in the same manner using the common hindcast period and qualifying hindcast members for each valid time.

**Single-system sea ice concentration**

Individual system sea ice concentration plots are available for each system where sea ice data is provided. These plots are not available for contributions which comprise multi-systems (ECCC).

No ice edge or forecast measure is defined when the grid cell land fraction is above a threshold (not considered sea), set by the contributing centres. In some cases, different thresholds have been used to mask the data. An additional mask is applied when plotting to remove inland bodies of water.

The plot types available are:

- Ensemble members
- Ensemble mean anomaly
- Probability

The common threshold of 15% concentration is used to define the ice edge. The ensemble members plot shows the sea ice edge for each forecast ensemble member in grey. The magenta/orange/black line is the edge defined from the field of the median of the forecast/hindcast/reanalysis at each grid-point, for each start and lead-month The common hindcast period is used to for the computation of the hindcast and ERA5 reanalysis median field. The ensemble mean anomaly plot shows the forecast ensemble mean sea ice concentration as an anomaly with respect to the hindcast climatology. The edges plotted are as described above.

The probability plot shows the probability of the sea ice concentration being above 15% in the forecast, computed by counting the number of ensemble members with greater than 15% concentration. The edges plotted are as described above.

**Verification plots**

Link: https://confluence.ecmwf.int/display/CKB/C3S+seasonal+forecasts+verification+plots

Anomalies are computed for each year in the common hindcast period, for each seasonal forecast system and ERA5 data for verification. Tercile categories, and corresponding probabilities, are also computed, as described above for the graphical products. One-month and three-month aggregations are produced for each start month and lead-time(s) combination.

The verification products available are:

- Temporal correlation
- Ranked Probability Score (RPS)
- Relative Operating Characteristic (ROC)

The correlations are Spearman rank correlation coefficients computed using the ensemble mean anomalies and ERA5 data, with p-values computed to indicate where significance is below 95%. For the SST indices, area-averages in the regions are computed first, and these are used for the correlation calculation.

Area under the ROC curve and RPS values are computed at each grid cell, for terciles categories, as described in the WMO forecast guidance.