This page documents the performance of the ERIC forecast methodology obtained with EFAS v4.1.

Evaluation Time Period: 1st October 2020 - 31st January 2021

Introduction

Since the release of EFAS 4.0 in October 2020, it has been noticed that there have been a very large number of flash flood notifications which have been issued. Besides the heavy workload this puts onto EFAS duty forecasters, it is also probable that many of these notifications will be false alarms. Therefore the aim of this evaluation is to see how the number of flash flood notifications can be reduced and what impact this has on the forecast skill.

The current criteria for issuing flash flood notifications are:

  • Upstream area <= 2000km2
  • Lead time of the event start <= 48 hours
  • Exceedance probability of the 5 year return period threshold >= 10%

Sensitivity Analysis

The first step was to investigate how sensitive the number of flash flood notifications is to the different criteria listed above. The upstream area and exceedance probability thresholds were varied and the number of notifications they generated were counted. The lead time was left unchanged as it is likely that reducing the number of notifications could only be achieved by reducing the lead time to values which may no longer be useful for EFAS partners. This was applied to all ERIC forecasts at 00 and 12 UTC during the evaluation period. Please note that this method will over-estimate the total number of flash flood notifications. In reality if a notification is already in place for an administration region from the previous forecast, then no new notification is issued. The method used here does not account for this and will instead generate a new notification.

Max Upstream Area (km2)

(RP5 exceedance prob=10%)

Number of Notifications
5007207
10007548
2000 (EFAS 4.0 threshold)7899
RP5 Exceedance Probability (%)
(max upstream area = 2000km2)
Number of Notifications
10 (EFAS 4.0 threshold)7899
205066
303565
402650
502012
Table 1. Number of flash flood notifications estimated from the sensitivity analysis.

The results show that varying the upstream area threshold only had a small reduction in the number of flash flood notifications, whereas increasing the 5 year return period exceedance probability caused a large reduction in notifications (Table 1). Therefore if the aim is to reduce the number of flash flood notifications, it would be advised to achieve this by increasing the 5 year return period exceedance probability threshold.

Impact upon Forecast Skill

Before proceeding with the findings from the sensitivity analysis, it is necessary to evaluate the impact this would have upon the forecast skill. Fewer flash flood notifications could result in an increased number of misses which reduces the skill. Therefore the evaluation procedure for ERIC conducted for EFAS 4.0 was repeated and is described below.

Observation Datasets:

FloodList (www.floodlist.com)

  • A total of 105 observations where the flood type was recorded as 'flash flood' were extracted from the flood event database which is populated with information from FloodList.com (see figure below)
    • This flood type was chosen in order to exclude riverine floods

European Severe Weather Database (www.eswd.eu)

  • Reports of 'heavy rain' were extracted from the online database
  • They were filtered to only include reports with quality level flags of QC1 or QC2
  • Further filtering was applied by only retaining observations which mentioned flooding in the EVENT_DESC column (i.e. event description). This included following media links to view the original reports.
    • It should be noted that riverine floods are not recorded in this dataset, only flash flooding
  • After this a total of 78 observations were retained (see figure below)

National Meteorological Service of Slovenia (ARSO)

  • Flash flood observations collected by ARSO were provided by Andrej Golob, these covered the period from 2005-2020 with a total of 1153 observations
  • During the evaluation time period for EFAS 4.1 there were 4 observations from this dataset

EFAS Hydro Database

  • Flash floods can also be derived from river gauge observations of water level or discharge in smaller catchments when compared against predefined flood thresholds
  • Time series were extracted at stations where the following criteria were met:
    • Hourly or 6-hourly observations available
    • Catchment area <=2000km2
    • Threshold level 1 data were available - this relates to the preparatory/early action phase and can relate to when rivers start going out of bank
  • For each extracted time series, flash flood were identified with the following criteria:
    • Threshold level 1 is exceeded
    • Peak prominence was greater than the flood threshold minus the baseflow
    • Peak flow lasted no longer than 5 hours
    • No other peaks occurred within a 36 hour period
  • Figure 1 below gives an example time series where flash floods (green dots) are distinguished from non-flash floods (orange crosses). The dotted grey line is the flood threshold

Figure 1. Streamflow time series showing has flash floods (green dots) have been distinguished from non-flash floods (orange crosses). Grey line shows flood threshold.

  • In total 29 events were extracted from this data source

Combining the Observations onto EFAS Administrative Regions

  • Each of the above datasets was combined together into one dataset.
  • The data show good distribution around Europe but with exceptions in France, Germany, Scandinavia and central-eastern Europe (Figure 2)
  • For each flash flood event the ID value of the EFAS Administrative Regions layer was recorded (EFAS flash flood notifications are generated per region in this layer).
  • The 216 original reports were summarised into 128 instances of flash flooding in different regions on 53 days during the evaluation period. This means that on some days there were multiple point observations of flash flooding in the same administration region, but in this evaluation this will only be treated as 1 report.

Figure 2. Locations of observed flash flood events during the evaluation period.

Colours represent different information sources.

Evaluation Methodology

ERIC forecasts produced at 00 and 12UTC on each day during the evaluation period were evaluated. A range of different exceedance probability threshold values of the 5 year return period were tested ranging from 0 to 100% in increments of 10%. The maximum upstream area threshold was also investigated for values of 1000km2 and 2000km2. Finally, the evaluation was performed separately for lead times of 0-24h, 24-48h, 48-72h, 72-96h and 96-120h.

For each forecast, the reporting points were extracted, then each of the different exceedance probability, upstream area and lead time thresholds were applied. For each application of the different thresholds, the ERIC reporting points which satisfied the threshold criteria were extracted. For each extracted reporting point the date of the forecasted flash flood event and the ID value of the EFAS administration regions layer were extracted. These were then compared with the observations to see if there was a corresponding observation in the same region on the same day (hit), or if there was no corresponding observation (false alarm), or if there was an observation but not forecasted event (miss). Correct negatives were identified for regions where a flash flood was neither forecasted nor observed, but this was only done for regions where one or more observations had been reported during the evaluation period. This process was repeated for every forecast during the evaluation period and the total number of hits, misses, false alarms and correct negatives was calculated.

The skill of the forecast for each unique combination of exceedance probability, upstream area and lead time threshold was calculated from the corresponding hits, misses, false alarms and correct negatives using the Hanssen-Kuipers score (aka Peirce's skill score). This score shows how well 'yes' events (i.e. flash floods) are distinguished from 'non' events. Scores can range from -1 to 1 with 1 being a perfect score and 0 meaning no skill.

Hanssen-Kuipers score = (Hits / Hits+Misses) - (False Alarms / False Alarms+Correct Negatives)

Evaluation Results

The results show that the Hanssen-Kuipers score is slightly higher when the maximum upstream area threshold is 2000km2 as opposed to when it is 1000km2 (Figure 3). This could be due to there being more forecasted flash flood events with the greater upstream area threshold which means there are either more hits and/or fewer misses. For both upstream area threshold values, the maximum Hanssen-Kuipers score occurs at a 5 year return period exceedance probability of 10%, which mirrors the finding for the evaluation of EFAS 4.0. However, for lead times up to 48 hours the decline of the Hanssen-Kuipers score is gradual. This means that a higher exceedance probability threshold could be chosen for these lead times without causing a large decline in skill. Therefore, since the aim of this evaluation is to see how the total number of flash flood notifications can be reduced, it could be recommended that a higher 5 year return period exceedance probability threshold of 30% is chosen.

Max Upstream Area 1000km2

Max Upstream Area 2000km

Figure 3. Hanssen-Kuipers score from the evaluation of ERIC forecasts for different upstream area thresholds,

5 year return period exceedance probability thresholds and lead times

To see the impact that choosing a higher 5 year return period exceedance probability threshold of 30% would have, the total number of hits, misses and false alarms produced by this threshold during the evaluation period was plotted (Figure 4). This was compared against the hits, misses and false alarms when the 5 year return period exceedance probability threshold is 10%, as is the case for EFAS 4.0. Results show that increasing the exceedance probability threshold to 30% greatly reduces the number of false alarms when compared to the exceedance probability threshold of 10%. For example, when the maximum upstream area is 2000km, the number of false alarms at lead times 24-48h reduces from 900 to <500. At the same time there is a slight reduction in the number of hits and misses, the proportional reduction of hits exceeds that of the misses which explains the slight reduction in the Hanssen-Kuipers score, shown above (Figure 3). These findings show that increasing the exceedance probability threshold to 30% greatly reduces the number of false alarms whilst only slightly reducing the number of hits and misses, which would mean much fewer flash flood notifications would be issued.

The differences in the hits, misses and false alarms between the maximum upstream area thresholds of 1000km2 and 2000km2 were small. Reducing the maximum upstream area threshold to 1000km2 only shows a small reduction in the number of all three components. 

Max Upstream Area 1000km2Max Upstream Area 2000km2

(current thresholds used in EFAS 4.0)

Figure 4. Total number of hits, misses and false alarms over the evaluation period for different 5 year return period exceedance probability thresholds.

Conclusions

The findings from this study show that reducing the number of flash flood notifications can be achieved by increasing the 5 year return period exceedance probability threshold to 30%. Whilst this was shown to reduce the forecast skill, the reduction was only slight when compared to the original threshold of 10%. Reducing the maximum upstream area threshold also reduced the number of flash flood notifications but the reduction was only minor when compared to the exceedance probability threshold. 

Therefore it is recommended that the flash flood notification criteria for EFAS 4.1 are the following:

  • Upstream area <= 2000km2
  • Exceedance probability of the 5 year return period threshold >= 30%
  • Exceedance of the 5 year return period probability threshold occurs within the next 48 hours