View Source

Copernicus Emergency Management Service - CEMS > 3 EFAS-IS interface > EFAS + JRC_Aug2023.png

Notification Criteria Abstract - Jesus Casado Rodriguez

The EFAS v5.2 upgrade features a revision of the flood notification criteria based on an analysis done on the EFASv4 simulations. The main result of the analysis is a new methodology to combine the EFAS forecasts forced by the 4 different Numerical Weather Prediction (NWP) models into a grand ensemble.

New Notification Criteria - Conrad & Stefania Grimaldi

Detailed description of the new notification criteria.

The new formal flood notification criteria are:

Catchment part of Conditions of Access (CoA), i.e., within the EFAS partner region.
A catchment area of at least 1000 km².
A total probability of exceeding the EFAS 5-year return period of at least 50%.
The onset of the event will occur between 2 and 7 days from the forecast time. E.g., for the forecast 2024-08-13 00UTC, the event must start between 2024-08-15 06UTC and 2024-08-20 00UTC. The onset of the event is the first time step when the probability criterion in point two is fulfilled.

An informal flood notification will be issued when any of the criteria above is not met, but the forecaster deems that the authorities should be informed. To issue an informal notification, the exceedance probability must be at least 40% and the catchment area at least 500 km².

EFAS v5.2 - Updates to visualisations Juan Pereira Colonese & Corentin Carton de Wiart

As a consequence of the new combination method and the new notification criteria, the following EFAS products have been changed:

The layers previously called Flood Probability < 48 h and Flood Probability > 48 h have been renamed as 5-year exceedence < 48 h and 5-year exceedence > 48 h, respectively. These layers show the total probability of exceeding the 5-year return period in the current forecast. The layer corresponding to lead times larger than 48 h is used to issue formal flood notifications.
The Flood probability persistence layer has been renamed Flood probability. The new layer shows the total probability of the current forecast of exceeding three return period thresholds (2, 5 and 20 years) computed based on all available forecasts at each time step, combined by forecast skill-based weights.
The Reporting Points is now based of the computed total probability, the label shown at each point indicates only one exceedance probability value. The arrow indicates the tendency (increasing, constant, decreasing) for the total probability between the current and last forecasts for the point. Some minor changes affect the pop-up window:
- A new grand ensemble hydrograph that combines the forecasts of all 4 NWPs.
- The total probability shown in the forecast overview table follows the new definition of total probability.
- A new group of forecast persistency table, which shows the total probability computed for each return period threshold for the last 6 issued forecasts.

EFASv4 skill assessment

Introduction

The definition of the new notification criteria is based on an analysis of EFASv4 discharge simulations in the period from October 2020 to June 2023 at the 1979 EFAS fixed reporting points with a minimum catchment area of 500 km². For this period, we considered EFASv4 reanalysis (water balance) as the ground truth, and we used the EFASv4 forecasts to search for the optimal notification criteria. In particular, we looked for answers to the following questions:

What's the best approach to blend the EFAS forecasts forced by the 4 NWPs into a single grand ensemble?
In the grand ensemble context:
- What is the optimal exceedance probability threshold?
- Is the persistence criterion still meaningful?
  Can we issue notifications at smaller catchments?

Methods

The analysis consisted on computing what events would have been notified for different combinations of the notification criteria, and evaluating the skill of each of those combinations by comparison against the flood events in the reanalysis (exceedances over the EFAS 5-year return period).

All possible combinations of the following criteria were tested:

Method to combine the NWPs into a grand ensemble:
- The procedure used in EFASv4 that evaluates independently deterministic and probabilistic NWPs.
- The simple model average. In this approach every NWP gets the same weight, no matter its nature (probabilistic or deterministic).
- A model average weighted by the number of members of the model. In this approach probabilistic NWPs prevail over the deterministic counterparts.
- A model average weighted by the probabilistic skill (Brier score) of each model at each lead time. In this approach models with higher skill predominate in the grand ensemble.
Exceedance probability thresholds ranging from 5% to 95% at 2.5% steps.
Persistence values from no persistence (1/1), 2 forecasts out of the last 4 (2/4), 2/2, 3/4 and 3/3. The persistence criterion was used in EFASv4 to avoid notifications caused by erratic NWP behaviour, when one forecast might be very different from the previous.

As a measure of notification skill we used the f-score, a metric widely used in machine learning for unbalanced classification tasks like the one at hand. The f-score is a combination of recall and precision, where recall is the ratio of actual events correctly identified, and precision is the ratio of notifications that are correct.

\text{recall} = \frac{\text{hits}}{\text{hits} + \text{misses}}

\text{precision} = \frac{\text{hits}}{\text{hits} + \text{false alarms}}

f_{\beta} = \left( 1 + \beta^2 \right) \frac{\text{precision} \cdot \text{recall}}{\beta^2 \cdot \text{precision} + \text{recall}}

The beta coefficient in the f-score allows to assign higher importance to either precision (beta smaller than 1) or recall (beta larger than 1). In our analysis, we tried to minimize the amount of false alarms by using a value of 0.8.

Results

Figure 1 summarizes in one image the results of the skill assessment. It exhibits for each combination of NWPs the evolution of the f-score depending on the probability threshold and the persistence criterion. As a reference, the black cross indicates the skill of the EFASv4 flood notification criteria.

Copernicus Emergency Management Service - CEMS > EFAS v5.2 - updates > skill_persistence_probability_060h_COMB.jpg Figure 1. Evolution of the notification skill with probability threshold and persistence for the different combinations of NWP.

Each plot represents a different combination of NWP. As a benchmark, the black cross represents the skill of the EFASv4 notification criteria.

This figure answers the first three questions posed at the beginning of this post:

The are two NWP combination methods that stand out as the best performing: member weighted and Brier weighted. Member weighted allocates weights to each NWP based on its number of members, whereas Brier weighted does the allocation based on skill. It turns out that ECMWF-ENS is both the model with a larger amount of members and that with best skill, reason why the performance of these two approaches is very similar. However, the Brier weighted approach is scientifically more sound, as it assigns weights based on quality instead of quantity. For this reason, this Brier weighted approach was selected as the method to generate the grand ensemble.
In all NWP combinations, the highest performance is yielded when persistence is not included as a notification criterion. The intuition is that the grand ensemble hides the erratic behaviour of deterministic NWPs, which was the reason for including this criterion in previous versions. Therefore, persistence is not a flood notification criterion in this new release.
The optimal probability threshold is 50%. The 30% probability threshold proved to be a correct choice when using a persistence of 3 forecasts. However, removing persistence affects the optimal probability threshold, which needs to increase to reduce the amount of false positives.

There remains one of the original questions to be answered: whether the minimum catchment area limit can be reduced. Figure 2 shows that the notification skill increases with catchment area (as it was expected), but it does not worsen dramatically when moving from the current 2000 km² limit to 1000 km². Actually, the skill of the new notification criteria at 1000 km² is better than the previous criteria at 2000 km². We expect that the skill at smaller catchments of EFASv5 is better that EFASv4, given the increase in spatial resolution, but this is to be assessed when long enough EFASv5 forecasts will be available.

Copernicus Emergency Management Service - CEMS > EFAS v5.2 - updates > skill_vs_area.jpg

Figure 2. Evolution of skill with catchment area limit.

The black line represents the notification criteria in EFASv4, while the colour lines depict the skill of the optimized criteria for the 4 combination methods.

The vertical, dotted line indicates the 2000 km² catchment area limit in EFASv4.

A last change has been introduced in the new notification criteria: the maximum lead time of 7 days. We analysed the evolution of skill with increasing lead time (Figure 3) and saw that skill degrades with lead time, as it could be expected. Given the poor skill at lead times closer to 10 days, we decided to establish an upper limit on lead time to 7 days, the forecast horizon of DWD-ICON.

Copernicus Emergency Management Service - CEMS > EFAS v5.2 - updates > skill_probability_leadtime.jpg Figure 3. Evolution of skill with lead time.

The black line represents the notification criteria in EFASv4, while the colour lines depict the skill of the different probability thresholds and the Brier weighted method.

Total probability

Figure 1 proves that skill-weighted (or Brier weighted) is the best method to combine the NWPs into a grand ensemble and estimate the total exceedance probability. What does skill-weighted exactly mean?

We estimated the probabilistic skill of NWPs using the Brier score (BS), an squared error metric that compares the observed and predicted probabilities of exceeding a particular magnitude (in our case the EFAS 5-year return period):

\text{BS} = \frac{1}{T}\sum_{t=1}^{T} \left( P_{obs,t} - P_{pred,t} \right)^2

where,

is the number of time steps,

P_{obs,t}

is the observed probability of exceedance, and

P_{pred,t}

is the predicted probability of exceedance at a specific time step

. Brier scores were computed for every NWP and lead time using the historical archive of EFAS v4 forecasts and reanalysis. The resulting Brier scores were converted into weights by inverse distance weighting:

w_{nwp,lt} = \frac{BS_{nwp,lt}^{-p}}{\sum_{i=1}^{4} BS_{i,lt}^{-7}}

where,

w_{nwp,lt}

is the weight assigned to a specific NWP and lead time, and

BS_{nwp,lt}

is the Brier score of that NWP at that lead time.

The figure below shows the distribution of weights among NWP models based on the Brier score that is used to compute the total probability in the new notification criteria. ECMWF-ENS proved to be the most skillful model, reason why the total exceedance probability relies mostly on this model, particularly as the lead time increases.

Copernicus Emergency Management Service - CEMS > EFAS v5.2 - updates > weights.jpg Figure 4. Distribution of weights over NWP models and lead time for the Brier weighted combination.

DWD stands for DWD-ICON, HRES for ECMWF-HRES, COS for COSMO-LEPS and ENS for ECMWF-ENS.