Placeholder Forecast anomaly and uncertainty computation methodology

This page describes the way the anomaly and uncertainty of the ensemble forecasts in the sub-seasonal and seasonal products are determined using the climatology as reference. This includes also how the dominant anomaly category (of the 7 predefined ones) and the uncertainty category (of the 3 predefined ones) of the ensemble forecasts are determined. This is a generic procedure, which is the same for both EFAS and GloFAS, as it is executed the same way for each river pixel, regardless of the resolution, and also the same for the sub-seasonal and seasonal products, as it works in the exact same way regardless of whether it is weekly mean values, as in the sub-seasonal, or monthly mean values, as in the seasonal.

The characterisation of the forecast signal in both the sub-seasonal and seasonal is based on the ensemble member's extremity in the context of the model climatological distribution.

Climatological percentiles and forecast anomaly categories

Currently, the sub-seasonal climate sample uses 660 reforecast values, while the seasonal uses 500 values. From the climate sample 99 climate percentiles are determined, which represent equally likely (1% chance) segments of the river discharge value range that occurred in the 20-year climatological sample (both sub-seasonal and seasonal is currently based on 20 years). Figure 1 shows an example idealised generic climate distribution, either based on weekly means or monthly means, with the percentiles represented along the y-axis. Only the deciles (every 10%), the quartiles (25%, 50% and 75%), of which the middle (50%) is also called median, and few of the extreme percentiles are indicated near the minimum and maximum of the climatological range indicated by black crosses. Each of these percentiles have an equivalent river discharge value along the x-axis. From one percentile to the next, the river discharge value range is divided into 100 equally likely bins (separated by the percentiles), some of which is indicated in Figure 1, such as bin1 of values below the 1st percentile, bin2 of values between the 1st and 2nd percentiles or bin 100 of river discharge values above the 99th percentiles, etc.

Figure 1. Schematic of the forecast anomaly categories, defined by the climatological distribution.

Based on the percentiles and the related 100 bins, there are seven categories defined, which will be used as anomaly categories (Table 1). These are also indicated in Figure 1 by shading. The two most extreme categories are the bottom and top 10% of the climatological distribution (<10% as red and 90%< as blue). Then the moderately low and high river discharge categories from 10-25% (orange) and 75-90% (middle-dark blue). The smallest negative and positive anomalies are defined by 25-40% and 60-75% and displayed by yellow and light blue colours in Figure 1. Finally, the normal condition category is defined as 40-60%, so the middle 1/5th of the distribution, coloured grey in Figure 1.

Anomaly categories	Name	Ranks	Description
Cat-1	Extreme low	1-10	bottom 10% of the climatological distribution
Cat-2	Low	10-25	15% from the 1st decile to the 1st quartile
Cat-3	Bit low	25-40	15% from the 1st quartile to the 2nd quintile
Cat-4	Near normal	40-60	20% from the 2nd to the 3rd quintile
Cat-5	Bit high	60-75	15% from the 3rd quintile to the 3rd quartile
Cat-6	High	74-90	15% from the 3rd quartile to the 9th decile
Cat-7	Extreme high	90-100	top 10% of the climatological distribution

Table 1: Definition and description of the 7 anomaly categories. The possible value ranges in the 'Ranks column' are inclusive at the start and exclusive at the end, so for example for Cat-1 the possible ranks are 1, 2, 3, ... and 10. Depending on the products, sometimes the middle three categories (Cat3, Cat-4 and Cat-5) are combined into one extended 'Near normal' category.

Extremity rank computation for ensemble members

The forecast has 51 ensemble members, again for both EFAS/GloFAS and both sub-seasonal or seasonal, regardless. The members are all checked for climatological extremity and placed in one of the 100 climate bins. This will be the anomaly or extremity level of the ensemble members, which can be called hereafter rank, as one of the values from 1 to 100. For example, 1 will mean the forecast value is below the 1st climate percentile (i.e. extremely anomalously low, less than the value that happened in the climatological period only 1% of the time), then 2 will mean the value is between the 1st and 2nd climate percentiles (i.e. slightly less extremely low), etc., and finally 100 will mean the forecast value is above the 99th climate percentile (i.e. extremely high as higher than 99% of all the considered reforecasts (representing the model climate conditions for this time of year, location and lead time).

Figure 2 shows the process of determining the ranks for each ensemble member. In this example, the lowest member gets the rank of 54 (red r54 on the graph in Figure 2) by moving vertically until crossing the climatological distribution and then moving horizontally to the y-axis to determine the two bounding percentiles and thus the right percentile bin. In this case the lowest ensemble member value is between the 53rd and 54th percentile, which results in bin54. Then all ensemble members, similarly, get a bin number, the 2nd lowest values with bin60 and so on until the largest ensemble member value getting bin97, as the river discharge value is between the 96th and 97th percentiles.

Figure 2. Schematic of the forecast extremity ranking of the 51 ensemble members and the 7 anomaly categories in the context of the climatological distribution.

The probability of the 7 anomaly categories is calculated by counting of the ensemble members in each category and then dividing by 51, the total number of members. In the example of Figure 2, there is no member in the 3 low anomaly categories, while the 'Near normal' category has 2, resulting in 3.9% probability, the 'Bit high' category 13, with 27.5%, the 'High' category 17, as 33.3%, and finally the 'Extreme high' category has 18 ensemble members, with 35.3% probability. The inset table in Figure 2 shows the numbers and the probabilities, but also shows the size (in terms of probabilities) of the 7 categories. This highlights, e.g., that the normal flow category's 3.9% probability is much lower than the climatologically expected probability of 20%, however, the 3 high flow categories have much higher probabilities than the climatological reference probability, especially the extreme high category, where the forecast probability (35.3%) is more than double the corresponding climatological probability (15%).

Extremity rank computation for ensemble members with 0 values

The forecast extremity rank computation can be done for any value above 0 m3/s. However, it becomes undefined when the values drop to 0, as there is no way to differentiate amongst the same values. The hydrological simulations of EFAS and GloFAS are less reliable and more prone to any random noise when we approach 0, so everything below 0.1 m3/s will be considered as 0 for the sub-seasonal and seasonal products. This problem can also happen for non-zero values, but normally the simulation should not produce a lot of identical non-zero values, unless there is some specific process, like reservoir operation rule, etc., which might generate such signal. There is no indication that the non-zero constant value is an issue at all in CEMS-flood, but it is clear that the 0 value is actually a major problem, as large parts of the world has dry enough areas often combined with small enough catchments to have near zero or totally 0 river discharge values.

For the forecast rank computation in the 0-value singularity case, a special solution was developed. All the 0 ensemble member values (all below 0.1 m3/s) get an evenly-representing rank assigned from any of the percentiles that have 0 values (i.e. below 0.1 m3/s) in the model climatology. In practice, this will mean, the 'rank-undefined' section of the ensemble forecast is going to be spread evenly across the 'rank-undefined' section of the climatology during the rank computation.

Figure 3 demonstrates the process on an idealised example, where the lowest 77 percentiles are 0 in the climatology and 23 out of 51 ensemble members are also 0 (see Figure 3a). The 23 ensemble members with 0 value then are spread across the 0-value range of the climatology from 1 to 77 (see Figure 3b). This way the ranks of the 23 members will be assigned from 1 to 77 with equal as possible spacing in between (see Figure 3c). Finally, the remaining non-zero ensemble members also get their ranks in the usual way, as described above in Figure 2. Finally, the schematic of ranks of all 51 members are provided in Figure 3d.

a)	b)
c)	d)

Figure 3. Schematic of the forecast extremity ranking calculation for areas with 0 river discharge values.

In the extreme case of all climate percentiles being 0, which happen over river pixels of the driest places of the world, such as the Sahara, the ensemble forecast member ranks can either be 100 for any non-zero value, regardless of the magnitude of the river discharge, or the evenly spread ranks from 1 to 100, as a representation of the totally 0 climatology. In the absolute most extreme case of all 99 climate percentiles being 0 and all 51 members being 0 in the forecast, the ranks of the forecast will be from 1 to 100 in equal representation. This means, this forecast will be a perfect representation of the climatological distribution, or with another word a perfectly 'normal' condition.

Dominant anomaly category computation for the ensemble forecast

The ensemble forecasts have 51 members, which will be assigned an extremity rank each. Using these 51 ranks the forecasts will be put in one of the 7 anomaly categories (as described in Table 1). This is done based on the arithmetic mean of the 51 ensemble member rank values (rank-mean). This rank-mean will also be a number between 1 and 100, but this time a real (not integer) number. If the anomaly is 50.5, that is exactly the normal (median) condition, i.e. no anomaly whatsoever. If the anomaly is below 50.5, then drier than normal conditions are forecast, if above 50.5, then wetter than normal. The lower/higher the anomaly value is below/above 50.5, the drier/wetter the conditions are predicted to be. The lowest/highest possible value is 1/100, if all ensemble members are 1 or 100 (the most extremely dry/wet). Then, based on this rank-mean, we define the dominant anomaly category (one of the 7 categories in Table 1) for the ensemble forecast, by placing the rank-mean into the right categories, as defined in Table 1 above. For example, all rank-mean values from 40.0 to 60.0, interpreted as 40.0<= <60.0, will be assigned to 'Near normal', or category-4.

The ensemble forecast anomaly was not based on the most probable of the 7 anomaly categories, as that would make it prone to jumpiness. For example, in the super uncertain case of 6, 8, 7, 7, 7, 9, 7 members being in each of the 7 anomaly categories, the forecast category (the dominant one) would be the 'High' category (cat-6), as that has the most members (9). However, it is likely that nearby river pixels could easily be only slightly different with 7, 9, 7, 7, 7, 7, 7 members in each category, in which case the dominant anomaly category would be the 'Low' category (cat-2), as now that has the most (again 9) members. It is worth mentioning that very uncertain cases are especially likely to happen at longer ranges. These two forecasts are only slightly different in terms of distribution, but the ensemble forecast anomaly categories would be almost the complete opposite of each other, making the signal look possibly very jumpy geographically. With the mean-rank definition we avoid this and simply assign the 'Near normal' category (cat-4) for both these forecasts, as the mean of the ranks are certainly very close to each other and both will be quite near the median.

There is a consequence of the 0-value problem over dry or very dry areas (described above), as some or all of the low anomaly signals will not be impossible to occur. Please consider these idealised examples as a help to demonstrate the situation with the 0-value problem. In these, for simplicity reason, a fixed portion of the ensemble members are being 0 and the rest having a simple average rank. This way the rank-mean and dominant anomaly category computation is simpler and easier to interpret.

In the below examples, is X% of the climatology is 0, then the average rank of the ensemble members which are also 0 will always be X+0.5 with the evenly distributed rank representation (e.g. for 10%, it is 1-10 evenly spread, so the average of those members will be 5.5).

Example No-1: Lowest 10% of the climatology is 0:

Number of 0 members	Number of non-0 members	Average rank of 0 members	Average rank of non-0 members	Rank-mean	Dominant anomaly category
0	51	NA (no member to rank)	11 (the lowest possible rank for a non-zero member if 1-10 percentiles in the climatology are 0)	(0 * 5.5 + 51 * 11)/51 = 11	Low (10-25)
0	51	NA	20	(0 * 5.5 + 51 * 20)/51 = 20	Low (10-25)
0	51	NA	50	(0 * 5.5 + 51 * 50)/51 = 50	Near normal (40-60)
0	51	NA	70	(0 * 5.5 + 51 * 70)/51 = 70	Bit high (60-75)
0	51	NA	100	(0 * 5.5 + 51 * 100)/51 = 100	Extreme high (90<)

11	40	5.5	11	(11 * 5.5 + 40 * 11)/51 = 9.81	Extreme low (<10)
11	40	5.5	20	(11 * 5.5 + 40 * 20)/51 = 16.87	Low (10-25)
11	40	5.5	50	(11 * 5.5 + 40 * 50)/51 = 40.40	Near normal (40-60)
11	40	5.5	70	(11 * 5.5 + 40 * 70)/51 = 56.08	Near normal (40-60)
11	40	5.5	100	(11 * 5.5 + 40 * 100)/51 = 79.61	High (75-90)

21	30	5.5	11	(21 * 5.5 + 30 * 11)/51 = 8.73	Extreme low (<10)
21	30	5.5	20	(21 * 5.5 + 30 * 20)/51 = 14.02	Low (10-25)
21	30	5.5	50	(21 * 5.5 + 30 * 50)/51 = 31.67	Bit low (25-40)
21	30	5.5	70	(21 * 5.5 + 30 * 70)/51 = 43.44	Near normal (40-60)
21	30	5.5	100	(21 * 5.5 + 30 * 50)/51 = 61.08	Bit high (60-75)

36	15	5.5	11	(36 * 5.5 + 15 * 11)/51 = 7.11	Extreme low (<10)
36	15	5.5	20	(36 * 5.5 + 15 * 20)/51 = 9.76	Extreme low (<10)
36	15	5.5	50	(36 * 5.5 + 15 * 50)/51 = 18.58	Low (10-25)
36	15	5.5	70	(36 * 5.5 + 15 * 70)/51 = 24.47	Low (10-25)
36	15	5.5	100	(36 * 5.5 + 15 * 50)/51 = 33.29	Bit low (25-40)

51	0	5.5	NA (no member to rank)	(51 * 5.5 + 0)/51 = 5.5	Extreme low (<10)

Example No-2: Lowest 30% of the climatology is 0:

Number of 0 members	Number of non-0 members	Average rank of 0 members	Average rank of non-0 members	Rank-mean	Dominant anomaly category
0	51	NA (no member to rank)	31 (the lowest possible rank for a non-zero member if 1-30 percentiles in the climatology are 0)	(0 * 15.5 + 51 * 31)/51 = 31	Bit low (25-40)
0	51	NA	50	(0 * 15.5 + 51 * 50)/51 = 50	Near normal (40-60)
0	51	NA	70	(0 * 15.5 + 51 * 70)/51 = 70	Bit high (60-75)
0	51	NA	100	(0 * 15.5 + 51 * 100)/51 = 100	Extreme high (90<)

11	40	15.5	31	(11 * 15.5 + 40 * 31)/51 = 27.65	Bit low (25-40)
11	40	15.5	50	(11 * 15.5 + 40 * 50)/51 = 42.55	Near normal (40-60)
11	40	15.5	70	(11 * 15.5 + 40 * 70)/51 = 58.24	Near normal (40-60)
11	40	15.5	100	(11 * 15.5 + 40 * 100)/51 = 81.77	High (75-90)

21	30	15.5	31	(21 * 15.5 + 30 * 31)/51 = 24.61	Low (10-25)
21	30	15.5	50	(21 * 15.5 + 30 * 50)/51 = 35.79	Bit low (25-40)
21	30	15.5	70	(21 * 15.5 + 30 * 70)/51 = 47.55	Near normal (40-60)
21	30	15.5	100	(21 * 15.5 + 30 * 100)/51 = 65.20	Bit high (60-75)

36	15	15.5	31	(36 * 15.5 + 15 * 31)/51 = 20.05	Low (10-25)
36	15	15.5	50	(36 * 15.5 + 15 * 50)/51 = 25.64	Bit low (25-40)
36	15	15.5	70	(36 * 15.5 + 15 * 70)/51 = 31.52	Bit low (25-40)
36	15	15.5	100	(36 * 15.5 + 15 * 100)/51 = 40.35	Near normal (40-60)

51	0	15.5	NA (no member to rank)	(51 * 15.5 + 0)/51 = 15.5	Low (10-25)

Example No-3: Lowest 70% of the climatology is 0:

Number of 0 members	Number of non-0 members	Average rank of 0 members	Average rank of non-0 members	Rank-mean	Dominant anomaly category
0	51	NA (no member to rank)	71 (the lowest possible rank for a non-zero members if 1-70 percentiles in the climatology are 0)	(0 * 35.5 + 51 * 71)/51 = 71	Bit high (60-75)
0	51	NA	100	(0 * 35.5 + 51 * 100)/51 = 100	Extreme high (90<)

11	40	35.5	71	(11 * 35.5 + 40 * 71)/51 = 63.34	Bit high (60-75)
11	40	35.5	100	(11 * 35.5 + 40 * 100)/51 = 86.08	High (75-90)

21	30	35.5	11	(21 * 35.5 + 30 * 71)/51 = 56.38	Near normal (40-60)
21	30	35.5	100	(21 * 35.5 + 30 * 100)/51 = 73.44	Bit high (60-75)

36	15	35.5	11	(36 * 35.5 + 15 * 71)/51 = 45.94	Near normal (40-60)
36	15	35.5	100	(36 * 35.5 + 15 * 100)/51 = 54.47	Near normal (40-60)

51	0	35.5	NA (no member to rank)	(51 * 35.5 + 0)/51 = 35.5	Bit low (25-40)

If all ensemble members are 0, then the ranks will spread evenly from 1 to 10 and the rank-mean will be around 5.5, so in the Extreme low category.

then the ensemble forecast anomaly (defined by the rank-mean) simply can not fall into the same extreme dry category, and the lowest possible is the 'Low' category with 10-25%. Similarly, if the lowest 25% is zero in the climatology, then the lowest possible anomaly signal is 'Bit low', so the category of 25-40%. Then, if 40% is zero, then there can not be anything lower than 'Near normal' anomaly for the ensemble forecast. All this makes sense, as actually it does not mean anything for those dry places to have below, say, 40th percentile, in case all of those lowest 40 percentiles are 0, as we can not go below zero. Or in the most extreme case, when even the 90th percentile is zero in the climatology, then the forecast can either be 'Near normal' or Extreme high. For this last case, if enough real time forecast ensemble member is above zero, then the rank-mean will exceed 90 and the dominant All this means, for these mixed-dry or super dry areas the number and distribution of the positively anomalous ensemble members will determine whether the anomaly will stay as 'Near normal' or will increase into one of the high categories. If enough members will be above the non-zero climate percentiles and thus high enough fraction of the 51-member ensemble forecast will get high enough ranks, then the distribution of the 51 ensemble member ranks will show a pronounced enough shift from the neutral/normal situation and the rank-mean will be high enough to fall into one of the high anomaly categories.

Forecast uncertainty category computation for the ensemble forecast

In addition to the forecast anomaly computation, as one of 7 anomaly categories, the forecast uncertainty is also represented in the sub-seasonal and seasonal products, namely on the new river network and basin summary products. The forecast uncertainty is defined by the standard deviation (std) of the ensemble member ranks (rank-std). If the members cluster well, and the spread of the ranks is low, then the forecast uncertainty will be low and conversely the confidence will be high.

The standard deviation of the even distribution with values ranging from 1 to 100 is (100-1)/sqrt(12) = 28.86, while the most extreme std value is when half of the members are with rank 1 and the other half with rank 99, in which case the std = 49.5. Obviously, the lowest std value is 0, when all ranks are the same. For forecast uncertainty, three uncertainty categories are defined, based on the rank-std value of the ensemble forecasts. Table 2 shows the categories, as defined by the std values of <10, 10<= <20 and 20<=.

Uncertainty categories	Name	Rank STD
Cat-1	Low uncertainty	0-10
Cat-2	Medium uncertainty	10-20
Cat-3	High uncertainty	20<

Table 2: Uncertainty categories defined by the standard deviation of the ensemble member ranks.

Page tree