EFAS v5.3 - Hydrological Post-Processing Calibration Evaluation

As part of the EFAS v5.3 upgrade the hydrological post-processing calibration procedure was performed in October 2024 with the EFAS v5 hydrological reanalysis (water balance) and observations up to 15October 2024. A total of 1948 stations were calibrated at 6-hourly timesteps and a further 228 were calibrated at daily timesteps. As part of the calibration process an evaluation is performed for each station to assess the expected skill of the post-processing method. This page summarises the results of the evaluation of this calibration for all stations.

Summary

Based on the results below, the Hydrological Post-Processing improves the behaviour of the EFAS simulations compared with the raw model outputs but care should be taken for extreme values that exceed those seen in the calibration period as the post-processed forecasts may tend toward climatology (especially at longer lead-times). The maximum observed value within the calibration timeseries is available in the 'EFAS Post-processing: Station calibration information' in the Post-processing tab in the reporting points (see EFAS Reporting Points).

New stations

In this calibration 355 new stations were added to the post-processed forecasts (274 stations at 6-hourly and 81 stations at daily). Stations may be added to the post-processing if they are new stations provided by EFAS hydrological data providers or if their record of observed river discharge has exceeded the minimum requirement of 2 years. For EFAS v5.3 new stations have been added in 14 countries as detailed in Table 1.

Table 1: Number of new post-processed stations per country.

Country	Number of new stations
Belgium	1
Bosnia and Herzegovina	1
Croatia	3
Czechia	2
Germany	26
Greece	1
Spain	49
Italy	4
Latvia	1
Moldova	2
Poland	1
Slovenia	6
Ukraine	2
United Kingdom	256

Calibration timeseries

The hydrological post-processing calibration requires at least 2 years of river discharge observations at 6-hourly or daily timesteps. Figure 1 shows the length of the record used to calibrate the hydrological hydrological post-processing models for each station. The exact dates used for the hydrological calibration for a station are shown in the 'EFAS Post-processing: Station calibration information' in the Post-processing tab in the reporting points (see EFAS Reporting Points). Note that longer calibration timeseries result in more robust post-processed forecasts.

Figure 1: Length of the calibration record for each station. A minimum of 2-years is required for a station to be calibrated.

Calibration Methodology

The station models calibrated in the hydrological post-processing (HPP) calibration process contain the information necessary for the MCP component of the post-processing methodology. The MCP component is responsible for correcting errors due to the initial conditions and the hydrological model. A full description of the post-processing methodology can be found on the CEMS-Wiki here but for the purpose of keeping this page self-contained the following two key points should be noted:

1) The MCP creates a naïve "first guess" forecast for the next 15 days by conditioning the climatological distribution of the observed and simulated discharge on the recent observations and simulations values. This naïve forecast therefore only considers the autocorrelation of the timeseries and the recent state of the river, and does not include any information on the upcoming meteorological situation. The naïve forecast of the observations is referred to below as the Intermediate forecast.

2) (After being spread-corrected) The EFAS ensemble forecasts are assimilated into these naïve forecasts to incorporate the meteorological information. This is done using a Kalman Filter so the resulting distribution is dependent on the distribution of the ensemble forecast, the distribution of the naïve forecast for the simulation (water balance), and the cross-covariance matrix between the water balance and the observations. The resulting forecast is referred to below as the Full forecast.

Given these two points of the methodology we attempt to evaluate the Intermediate forecast forecast and the Full forecast. The Intermediate forecast can be thought of as a lower threshold of the potential skill of the post-processed forecasts. Whilst there are situations where the skill for a specific forecast may be lower (e.g., ensemble forecast introduces a fictional flood event or an event is missed, erroneous observations are included in the process) the post-processed forecasts will in general be at least as good as or of higher skill than the Intermediate forecast. The Full forecast is calculated here using the water balance as the ensemble forecast. Within the post-processing method the ensemble forecast is considered a forecast of the simulation and not the observations (i.e., it's forecasting the water balance). Therefore, the perfect ensemble forecast would be the water balance itself (deterministic and accurate). As we are assimilating a "perfect forecast", the Full forecast can be considered an upper skill threshold although there may be occasions when the post-processed forecast performs better due to the interaction between the uncertainty of the ensemble forecast and the Intermediate forecast.

Calibration Results

In the following results the extremely poorly performing stations were removed and are being analysed separately.

Assessment of the CRPS skill score

The CRPSS is shown in Fig. 2 for the Intermediate forecast (blue) and Full forecast (orange) forecasts for six-hourly stations (upper panel) and daily stations (lower panel). The CRPS evaluates the full forecast distribution. The benchmark forecast is the water balance and the "truth" values are the observations. Over 75% of stations show an improvement compared to the water balance even when no meteorological information is included (Intermediate forecast). This is mainly due to bias correction. When the meteorological information is assimilated (Full forecast) all stations show an improvement at a lead-time of 15 days with 50% of stations showing a reduction in error (in terms of the CRPS) of 50%.

Figure 2: Continuous Ranked Probability Skill Score (CRPSS) for stations calibrated at six-hourly and daily timesteps.

The maximum day for which the CRPSS is above 0.5 for each station is shown in Fig. 3 for Intermediate forecast (upper panel) and Full forecast (lower panel). The skill for the operational post-processed forecasts is expected to be between these values. It can be seen that when the meteorological forcings are included (Full forecast) are included that the majority of stations are improved across the forecast period.

Figure 3: Maximum lead-time (in days) when the CRPSS in above 0.5 for Intermediate forecast (upper panel) and Full forecast (lower panel).

Assessment of the CRPS skill score at high flow values

To investigate how the post-processed forecasts perform for larger discharge values we assess the CRPSS of the Full forecast over the largest 10% of observations in the calibration timeseries for each station. Figure 4 shows the CRPSS for the largest 10% of observations and the CRPSS for all observations for 6-hourly (upper panel) and daily stations (lower panel). As can be seen the CRPSS is lower for the 10% of highest discharge values than it is when normal flows are also considered. For high flows over 50% of stations are still improved at a lead-time of 15 days but the decrease in skill across lead-times is much more severe. Figure 5 shows the maximum day for which the CRPSS is above 0.5 for each station for the top 10% of observations for the Full forecast.

Figure 4: Continuous Ranked Probability Skill Score (CRPSS) for the top 10% of observations for stations calibrated at six-hourly and daily timesteps.

Figure 5: Maximum lead-time (in days) when the CRPSS in above 0.5 for the top 10% of observations for the Full forecast.

Assessment of the KGE skill score

The modified KGESS is shown in Fig. 6 for the forecast median of the Intermediate forecast (blue) and Full forecast (orange) forecasts for six-hourly stations (upper panel) and daily stations (lower panel). Again, the benchmark is the water balance and the "truth" value are the observations. Improvement over the water balance is shown by 50% of stations at 10 days lead-time for the Intermediate forecast, however, over 25% of stations do show a degradation in the forecast median for lead-times greater than 3 days. This is unsurprising as no meteorological information is included and the information provided by the recent observations is limited at longer lead-times. When the meteorological information is assimilated (Full forecast) almost 75% of stations show an improvement in terms of the modified KGE at a lead-time of 15 days. The operational post-processed forecasts are likely to fall somewhere between the skill of the Intermediate forecast and the Full forecast depending on the accuracy and confidence of the ensemble forecast. There are some stations where the degradation of the modified KGE is severe.

Figure 6: KGESS (benchmark is the water balance) for the forecast median of the Intermediate forecast (blue boxes) and the Full forecast (orange boxes) forecasts.

Assessment of the KGE components

The following figures show the correlation, bias ratio, and variability ratio for the Intermediate forecast, Full forecast, and the simulation (water balance) to identify where the loss in KGE is occurring. All three components have a perfect score of 1.

Figure 7 shows the correlation for the forecast median of the Intermediate forecast (blue) and Full forecast (orange) forecast, and the simulation (green) for six-hourly stations (upper panel) and daily stations (lower panel). When meteorological forcings are not considered (Intermediate forecast; blue) the correlation of the forecast median decreases from close to 1 to just below 0.5 for about 50% of stations. This is because the Intermediate forecast tends towards climatology at longer lead-times. For some stations, the forecast median shows a negative correlation with the observations at the longer lead-times but for other stations shows a near perfect correlation of 1 even at lead-times of 15 days. When meteorological forcings are considered (Full forecast; orange) the forecast median has a correlation above 0.5 for almost all stations at a lead-time of 15 days. These are in comparison with the water balance that has a correlation of #.

Figure 7. Correlation between the observation and the Intermediate forecast (blue), the Full forecast (orange), and the simulation (green).

Figure 8 shows the bias ratio for the forecast median of the Intermediate forecast (blue) and Full forecast (orange) forecast, and the simulation (green) for six-hourly stations (upper panel) and daily stations (lower panel). When meteorological forcings are not considered (Intermediate forecast; blue) the bias ratio of the forecast median decreases from close to 1 to just above 0.8 for about 50% of stations. This is again because the Intermediate forecast tends towards climatology at longer lead-times. When meteorological forcings are considered (Full forecast; orange) the forecast median has a bias ratio above 0.9 for almost all stations at a lead-time of 15 days.

Figure 8. Bias ratio between the observation and the Intermediate forecast (blue), the Full forecast (orange), and the simulation (green).

Figure 9 shows the variability ratio for the forecast median of the Intermediate forecast (blue) and Full forecast (orange) forecast, and the simulation (green) for six-hourly stations (upper panel) and daily stations (lower panel). When meteorological forcings are not considered (Intermediate forecast; blue) the variability ratio of the forecast median decreases from close to 1 to just above 0.8 for about 50% of stations. When meteorological forcings are considered (Full forecast; orange) the forecast median has a variability ratio above 0.8 for almost all stations at a lead-time of 15 days. However, these results do show that the hydrologically post-processed forecasts underestimate the variability of the river discharge.

Figure 9. Variability ratio between the observation and the Intermediate forecast (blue), the Full forecast (orange), and the simulation (green).

Considerations and limits of this evaluation

This evaluation is intended to provide an indication of the expected skill of the EFAS post-processed forecasts but it is not a full evaluation of the post-processed forecast. It should be noted that this evaluation was conducted over the calibration timeseries and as such for some stations is very short evaluation period. It should also be noted that all observations in the evaluation are within the calibration range such that there are no "unseen" events even in the analysis of the top 10% of observations. Finally, the water balance is used in place of the ensemble forecast. Whilst this acts as a perfect forecast in our evaluation it does not account for the uncertainty in the ensemble forecast which may improve the CRPS of the simulation.

Page tree