This page describes the way the anomaly and uncertainty signal of the ensemble forecasts in the sub-seasonal and seasonal products are determined using the climatology as reference. This is a generic procedure, which is the same for both EFAS and GloFAS, as it is executed the same way for each river pixel, regardless of the resolution, and also the same for the sub-seasonal and seasonal products, as it works in the exact same way regardless of whether it is weekly mean values, as in the sub-seasonal, or monthly mean values, as in the seasonal.
The characterisation of the forecast signal in both the sub-seasonal and seasonal is based on the ensemble member's extremity in the context of the model climatological distribution, which is explained further below.
Climatological percentiles and forecast anomaly categories
From the climate sample, 99 climate percentiles are determined, which represent the range of equally likely (1% chance) segments of the weekly- or monthly-averaged river discharge magnitude that occurred in the 20-year climatological sample. Figure 1 shows an example generic climate distribution, with the percentiles represented along the y-axis. Each of these percentiles have an equivalent river discharge magnitude along the x-axis. From one percentile to the next, the river discharge magnitude range is divided into 100 equally likely bins (separated by the percentiles), such as bin1 of values below the 1st percentile, bin2 of values between the 1st and 2nd percentiles or bin 100 of river discharge values above the 99th percentiles, etc. In Figure 1, only the deciles (every 10%), the quartiles (25%, 50% and 75%), of which the middle (50%) is also called median, and few of the extreme percentiles are indicated near the minimum and maximum of the climatological range shown by black crosses.
Figure 1. Schematic of the forecast anomaly categories, defined by the climatological distribution.
Based on the percentiles and the related 100 bins, 5 main anomaly categories were defined (Table 1). These are also indicated by vertical lines separating them in Figure 1. The two most extreme categories are the bottom and top 10% of the climatological distribution (<10% as 'Extreme low' and 90%< as 'Extreme high'). Then the moderately low and high river discharge categories from 10-25% ('Low') and 75-90% ('High'). The remaining section is a larger category around normal, called 'Near normal'. With these 5 main categories, the larger negative and positive anomalies can be better highlighted. For some products, which have the capacity to represent more detailed anomalies, the 'Near normal' category is divided into 3 sub-categories. These are the smallest negative and positive anomalies, defined by 25-40% ('Bit low') and 60-75% ('Bit high') and the remaining normal condition category, defined as 40-60% called 'Normal' in Figure 1. The 5+2 categories are shown in Figure 1 and Table 1, with the colour coding also indicated in the Table 1.
Table 1: Definition and description of the 5+2 anomaly categories. The possible value ranges in the 'Rank' column are inclusive at the start and exclusive at the end, so for example for the category of 'Extreme low' the possible ranks are 1, 2, 3, ... and 10. For some of the products the 'Near normal' one is sub-divided into three middle as 'Bit low', 'Normal' and 'Bit high'. The categories are colour-coded as they appear on the web products with medium uncertainty (see further below).
Extremity rank computation for ensemble members
The sub-seasonal or seasonal forecasts have 51 ensemble members each. The members are all checked for climatological extremity and placed in one of the 100 climate bins (modelled climate conditions for this time of year, location and lead time); the allocated bin corresponds to the anomaly or extremity level of the ensemble member, called hereafter rank (from 1 to 100). For example, rank 1 means the forecast value is below the 1st climate percentile (i.e. extremely anomalously low, less than the value that happened in the climatological period only 1% of the time), rank 2 means the value is between the 1st and 2nd climate percentiles (i.e. slightly less extremely low), etc., and finally rank 100 means the forecast value is above the 99th climate percentile (i.e. extremely high as higher than 99% of the climatological distribution).
Figure 2 shows the process of determining the rank of each ensemble member. In this example, the lowest member gets the rank of 54 (red 'r54' on the graph in Figure 2) by moving vertically until crossing the climatological distribution and then moving horizontally to the y-axis to determine the two bounding percentiles and thus the right climate bin. In this case, the lowest ensemble member value is between the 53rd and 54th percentile, which results in 'bin54'. Then all ensemble members, similarly, get a bin number, the 2nd lowest value with 'bin60' and so on until the largest ensemble member value will get 'bin97', as the river discharge value is between the 96th and 97th percentiles.
Figure 2. Schematic of the forecast extremity ranking of the 51 ensemble members and the 7-anomaly categories in the context of the climatological distribution.
The probability of the forecast to be within one of the 7-anomaly categories is calculated by counting the ensemble members in each category and then dividing by 51, the total number of members. In the example of Figure 2, there is no member in the 3 low anomaly categories, while the 'Normal' category has 2 members, resulting in a 3.9% probability, the 'Bit high' category has 13 members, with a probability of 27.5%, the 'High' category has 17 members, with a probability of 33.3%, and finally the 'Extreme high' category has 18 ensemble members, with a 35.3% probability. The inset table in Figure 2 shows the number of ensemble members with the corresponding probabilities in the 7 categories, but also shows the climatological 'size' in terms of probabilities of the 7 categories. For ease of interpretation, the 7 categories are displayed here with different colours. This highlights, e.g., that the normal flow category's 3.9% probability is much lower than the climatologically expected probability of 20%, however, the three high flow categories have each much higher probabilities than the climatological reference probability, especially the extreme high category, where the forecast probability (35.3%) is more than double the corresponding climatological probability (15%). In addition, the extended 'Near normal' category has 15 members with 31.4% probability, which is lower than the climatological probability of 50%.
Extremity rank computation for ensemble members with 0 values
The forecast extremity rank computation can be done for any value above 0 m3/s. However, it becomes undefined when the values drop to 0, as there is no way to differentiate amongst values which are the same. The hydrological simulations of EFAS and GloFAS are less reliable and more prone to any random noise when they approach 0, so everything below 0.1 m3/s is considered 0 in for the sub-seasonal and seasonal products. This problem can also happen for non-zero identical values, but normally the simulation should not produce a lot of identical non-zero values, unless there is some specific process, like reservoir operation rule, etc., which might generate such signal. There is no indication that the non-zero constant value is an issue at all in CEMS-flood, but it is clear that the 0 value is actually a major problem, as large parts of the world has dry enough areas often combined with small enough catchments to have near zero or totally 0 river discharge values. Some further explanation of the 0-value treatment is given in the expandable section below.
Expected forecast anomaly category computation
Each of the 51 ensemble forecasts members are assigned an extremity rank, with the arithmetic mean of the 51 ensemble member rank values (rank mean) is calculated (see also Figure 4).
This rank mean value, with a value range from 1.0 to 100.0, will define the expected forecast anomaly category for the whole ensemble forecast, by assigning the category where the rank mean value falls (one of the 7 categories in Table 1 above). For example, all rank mean values from 40.0 to 60.0 will be assigned the 'Normal' anomaly category for the forecast.
Figure 4. Schematic of the forecast extremity ranking of the 51 ensemble members and the calculation of the expected forecast anomaly category for the whole ensemble.
Forecast uncertainty category computation
The forecast uncertainty is defined by the standard deviation (std) of the ensemble member ranks, which all can be from 1 to 100 (rank std):
If the ensemble member ranks cluster well, and the spread of the ranks is low, then the forecast uncertainty is low. One specific example is the even distribution with 51 values spread from 1 to 100 evenly, as ranks of 1, 3, 5,..., 47, 49, 50 (or 51), 52, 54,..., 96, 98 and 100. This distribution has a mean of very close to 50.5 and a standard deviation of very close to 29.0. Then another example can be the most uneven distribution of rank values of 1, 1, 1, ..., 1, 100, 100, 100,..., 100, with either 25 values of 1 and 26 values of 100 or vice versa. In this case the rank-mean is either 49.5 or 51.5 (depending on either 1 or 100 has 26 and not 25 values) and the standard deviation is in both case the same 49.5. Another specific example is when all values are the same, so there is no variability amongst the 51 ranks at all, in which case the rank mean is the same value and the rank std is 0.
In order to be able to represent the uncertainty sub-categories with different colours for all of the 5 main anomaly categories, the number of uncertainty categories was limited to 3 with low/medium/high uncertainties. Based on the rank std values presented for the specific examples above, the three categories were defined by the easy to remember split with values of 10 and 20 (see Table 2). These work well enough for all lead times and geographical areas and will give an indication of how uncertain the expected forecast anomaly is for each pixel or area and lead time.
Table 2: Uncertainty categories defined by the standard deviation of the ensemble member ranks. The categories are also colour coded as they appear on the web products for the 'Near normal' extended anomaly category.
Simplified examples to help interpreting the expected forecast anomaly and uncertainty category information
In this section, examples are given with simplified rank distributions in order to demonstrate the rank mean and rank std computation for the expected anomaly and uncertainty categories. Based on these examples, the users can have a feel on how the rank mean and rank std values change with the changing underlying distributions. In addition, the impact of the presence of 0 values in the climatology and in the ensemble forecasts is also demonstrated for different severity of the 0-value problem, with the complexity of these dry cases.
In these simplified examples, a fixed portion of the climatological and/or ensemble forecast distribution is 0, while the non-zero ensemble members have just one rank for the very dry cases and 5 for the non-zero cases, with members having the same rank in each group. This way, the computation methodology can be demonstrated in a simple way that is easier to interpret.
The non-zero value examples highlight that shifting the same rank distribution 'up' or 'down' (i.e. for wetter or drier) does not change the standard deviation (and thus the uncertainty). Also, after 'narrowing' the rank distribution over the same mean value (i.e. making it cluster more), the mean does not change, but the uncertainty drops markedly. Moreover, in a similar manner, by adding extreme members (i.e. near 1 or 100 very extreme members), even if only very few members, the uncertainty will increased quite substantially.
For those forecasts, when some portion or all of the climatological percentiles are 0, it is a general rule of thumb that as the percentage of zero climate percentiles increase, it gets more and more difficult to end up with negative expected forecast anomalies. The lowest possible rank mean values are going to happen for forecasts with all 0 values, in which case the forecast rank mean is going to be determined by the size of the 0-value section of the climatology. For example, if the lowest 30% of the climatology is 0, maybe in southern Spain somewhere on a small river catchment, then the rank-mean of the forecast of a constant 0-value will be about 15, which will put this forecast in the expected anomaly category of 'Low'. But, with 70% of climatology being 0, say further into the Sahara, there the driest possible ensemble forecast (all members are 0 value) is only going to be the rank mean of 35 (can not go any lower than that), so with expected anomaly category of 'Bit low'. Drier than this anomaly is simply physically not possible for such a climatologically dry place.
So, for the option of 10% of climatology being 0, the absolute minimum possible forecast rank mean is 5.5, while for 30% it will be 15.5 and for the totally dry climatology, where all 99 percentiles are zero, the rank-mean will be 50.5. How much higher (than these absolute minimums) the rank mean goes will depend on how many of the ensemble members will be non-zero and with which actual rank (determined by the non-zero section of the climatology). For example, one of the most extreme cases will be when all 99 climatological percentiles are 0 and all ensemble forecast members are greater than 0. For this super unlikely to occur event, the rank-mean of the forecast will always be 100 (and the expected forecast anomaly category 'Extreme high'), regardless of the actual ensemble member values (i.e. how much higher than 0 they are). So, even if all forecast ensemble member river discharge values are very low, say from 0.12 to 0.23, the forecast rank-mean will still be 100 and the expected forecast anomaly category 'Extreme high'.
Few examples are shown here, when there is no 0-value in the climatology, so all ensemble forecast members can be ranked without any issue. For simplicity, 5 groups are used in the forecast only. The table below shows the numbers and the related average ranks for the 5 groups, with the rank mean, rank std and expected anomaly and uncertainty categories determined from those cases. For example, in the first row, 10 ensemble members are in the first group, which will all have the rank of 40. Then 10 members will be in the 2nd group with the rank of 45, and so on. The rank mean of this simplified forecast distribution will be very close to 50 (mean of 40-45-50-55-60 with almost the same population in each group) and the rank std will be about 7. This puts this forecast case into the 'Normal' expected anomaly category (rank mean between 40 and 60) and the 'Low' uncertainty category (rank std below 10).
The even distribution is represented first below, for which it is shown that by shifting the same rank distribution up or down does not change the standard deviation (and uncertainty). This is true for any variety of rank distributions. Also, after 'narrowing' the rank distribution, the mean does not change, but the uncertainty drops markedly. Moreover, in a similar manner, by adding extreme members (i.e. 1 or 100 or near that), even if only with very few members (2 in this example below), the uncertainty can be increased quite substantially.
In these examples, again for simplicity reasons, the climatological and forecast values will only be in one of 2 categories, either 0-value or non 0-value. This way, the main impact of the 0/non-0 value issue can be demonstrated. In the tables below, the numbers and the related average ranks are given for the two groups of 0 and non-0 ensemble members, with the rank mean, rank std and expected anomaly and uncertainty categories determined from those cases.
There are 4 tables, with 10%, 30%, 70% and 100% of 0-value in the climatology (i.e increasingly dry climate). For example, in the 7th row of the 1st table with 10% of 0 in the climatology, 11 ensemble members are 0-value and the remaining 40 are greater than 0. The average rank for the 0-value members are 5.5 (as this is given by the method of handling the 0-value issue with equal representation, explained above), while the average rank for the non-zero members is given as an example of 11. The related rank-mean is then 9.81, making this forecast into the 'Extreme low' expected category while the rank-std is 2.26, with low uncertainty category.
These tables demonstrate the complex interaction between the dryness of the climatology and ensemble forecasts, reflected in the forecast rank mean and rank std values and the subsequent expected anomaly and uncertainty categories. They also demonstrate, how less likely it becomes to have negative anomalies as the climate becomes drier and drier.