Placeholder Forecast anomaly and uncertainty computation methodology

This section describes the way the anomaly and uncertainty of the ensemble forecast are determined, using the climatology as reference. And generally, how the probability of the 7 anomaly categories and the three uncertainty categories of the forecasts are determined. This is a generic procedure, which is the same for both EFAS and GloFAS, as it is executed the same way for each river pixel, regardless of the resolution, and also the same for the sub-seasonal and seasonal products, as it works on the weekly (sub-seasonal) or monthly (seasonal) mean discharge values the same way again.

Climatological bins and anomaly categories

Currently, the sub-seasonal climate sample uses 660 reforecast values, while the seasonal uses 500 values. From the climate sample then 99 climate percentiles are determined, which represent equally likely (1% chance) segments of the river discharge value range that occurred in the 20-year climatological sample (both sub-seasonal and seasonal currently is based on 20 years). Figure 1 shows an example generic climate distribution, either based on weekly means or monthly means, with the percentiles represented along the y-axis. Only the deciles (every 10%), the two quartiles (25%, 50% and 75%), of which the middle (50%) is also called median, and few of the extreme percentiles are indicated near the minimum and maximum of the climatological range indicated by black crosses. Each of these percentiles have an equivalent river discharge value along the x-axis. From one percentile to the next, the river discharge value range is divided into 100 equally likely bins, some of which is indicated in Figure 1, such as bin1 of values below the 1st percentile, bin2 of values between the 1st and 2nd percentiles or bin 100 of river discharge values above the 99th percentiles, etc.

Figure 1. Schematic of the forecast anomaly categories, defined by the climatological distribution.

Based on the percentiles and the related 100 bins, there are seven anomaly categories defined. These are indicated in Figure 1 by shading. The two most extreme categories are the bottom and top 10% of the climatological distribution (<10% as red and 90%< as blue). Then the moderately low and high river discharge categories from 10-25% (orange) and 75-90% (middle-dark blue). The smallest negative and positive anomalies are defined by 25-40% and 60-75% and displayed by yellow and light blue colours in Figure 1. Finally, the normal condition category is defined as 40-60%, so the middle 1/5th of the distribution, coloured grey in Figure 1.

Forecast extremity rank computation

The forecast has 51 ensemble members, again for both EFAS/GloFAS and both sub-seasonal or seasonal regardless. The members are all checked and placed in one of the 100 climate bins. This will be the anomaly or extremity level of the ensemble members, which can be called hereafter rank, as one of the values from 1 to 100. For example, 1 will mean the forecast value is below the 1st climate percentile (i.e. extremely anomalously low), then 2 will mean the value is between the 1st and 2nd climate percentiles (i.e. slightly less extremely low), etc., and finally 100 will mean the forecast value is above the 99th climate percentile (i.e. extremely high as higher than 99% of all the considered reforecasts representing the model climate conditions for this time of year, location and lead time).

Figure 2 shows the process of determining the ranks for each ensemble member. In this example, the lowest member gets the rank of 54 (red r54 on the graph inFigure 2) by moving vertically until crossing the climatological distribution and then moving horizontally to the y-axis to determine the two bounding percentiles and thus the right percentile bin. In this case the lowest ensemble member value is between the 53rd and 54th percentile, which is bin54. Then all ensemble members, similarly, get a bin number, the 2nd lowest values with bin60 and so on until the largest ensemble member value getting bin97, as the river discharge value is between the 96th and 97th percentiles.

Figure 2. Schematic of the forecast extremity ranking of the 51 ensemble members and the 7 anomaly categories in the context of the climatological distribution.

The probability of the 7 anomaly categories is calculated by the count of ensemble members in each category and then dividing by 51, the total number of members. In the example of Figure 2, there is no member in the 3 low flow anomaly categories, while the normal category has 2, resulting in 3.9% probability, the bit high category 13, with 27.5%, the high category 17, 33.3%, and finally the extreme high category 18 ensemble members, with 35.3% probability. The inset table in Figure 2 show the numbers and the probability, but also shows the size (in terms of probabilities) of the 7 categories. This highlights, e.g., that the normal flow category's 3.9% probability is much lower than the climatologically expected probability of 20%, however, the 3 highe flow categories have much higher probability than the climatological reference probability, especially the extreme high category, where the forecast probability (35.3%) is more than double the corresponding climatological probability (15%).

Treatment of 0 values

The forecast extremity rank computation can be done for any value above 0 m3/s. However, it becomes undefined when the values drop to 0, as there is no way to differentiate the rank for the same value. The simulations are less reliable when we approach 0, so everything below 0.1 m3/s will be considered as 0 for the sub-seasonal and seasonal products. This problem can also happen for non-zero values, but normally the simulation should not produce a lot identical non-zero values, unless there is some specific process, like reservoir operation rule, etc., which might generate such signal. There is no indication that the non-zero constant value is an issue at all, but it is clear that the 0 values is actually a major problem, as large parts of the world has dry enough areas often combined with small enough catchments to have near zero or totally 0 river discharge values.

For the forecast rank computation in the 0-value singularity case, a special solution was developed. All the 0 ensemble member values (all below 0.1 m3/s) get an evenly-representing rank assigned from any of the percentiles that have 0 (i.e. below 0.1 m3/s) values in the model climatology. In practice, this will mean, the 'rank-undefined' section of the ensemble forecast is going to be spread evenly across the 'rank-undefined' section of the climatology during the rank computation.

Example-1:

Consider for example the following hypothetical example in Table 1. For simplicity, we represent the ensemble forecast by 21 members only (instead of 51). We also indicate below which values are considered as 0 (i.e. below 0.1 m3/s). In the 1st part of Table 1, you see the climatological percentile values, whether they are 0 or not, the possible rank values that can be assigned and the range for each of these rank values. Also, you see in the 2nd part in Table 1 the 21 ensemble member mean river discharge values, whether they are 0 or not, and then the assigned rank.

In this example, out of the 99 climate percentiles 58 are 0 (or considered 0), so this section is the undefined section where the rank computation is not directly possible. From the forecast, from the 21 represented ensemble members, 6 are (considered) 0. These 6 members will get ranks that come from the 1-58(/59) section of the 0 climatology, equally spanning it. For this example, the ranks would follow as 0, 58/5, 2*58/5, 3*58/5, 4*58/5 and finally 5*58/5, so 0, 11.6, 23.2, 34.8, 46.4 and 58. However, we can not assign fractional ranks, so we always choose the nearest integer number as rank. In this case, the ranks then would be for these 6 ensemble members: 0, 12, 23, 35, 46 and 58.

For the remaining ensemble members only few more are defined here, and the 61-96 percentiles and the related ranks are not listed here due to the limited space.


Raw climate values		0	0	0	0	0	0.01	0.09	0.11	0.15	...	45.7	75.2	108.9
Considered as 0 (yes/no)		1	1	1	1	1	1	1	0	0	0	0	0	0
Possible rank values	0	1	2	3	...	56	57	58	59	60	...	97	98	99
Rank description	<P1	P1<= <P2	P2<= <P3	P3<= <P4	...	P56<= <P57	P57<= <P58	P58<= <P59	P59<= <P60	P60<= <P61	...	P97<= <P98	P98<= <P99	P99<=


Raw values	0	0	0.01	0.05	0.07	0.09	0.2	1.4	3.1	6.3	9.1	12.1	14.0	16.0	18.1	20.1	24.0	27.1	30.0	51.1	109.4
Considered as 0 (yes/no)	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
Ensemble member rank values	0	12	23	35	46	58	60	...	...	...	...	...	...	...	...	...	...	...	...	97	99

Table 1: Hypothetic example-1 of climate distribution and the related forecast ensemble distribution represented with only 21 members.

Example-2:

In this other super extreme example (Table 2) there is no measurable river discharge in the climatology at all. this, for example can happen somewhere in the middle of the desert in the Sahara, where basically all of the daily river discharge values in the longterm reanalysis are effectively 0, and thus all the 99 percentiles of the model climatology are 0.

In the forecast, however, there are few members which have noticeable discharge, so effectively are very extreme in the climatological context, even though the absolute values are not that large necessarily. So, out of the 21 members 17 are 0 (or considered 0), and those 17 members will get ranks assigned from the 0-section of the climatology, which is in this case all the way from 0 to 99. Thus, the ranks will be as follows: 0, 99/16 (so 6 as rounded value), 2*99/16 (12), 3*99/16 (19), ..., 15*99/16 (93) and 16*99/16 (99).

Moreover, all the forecast ensemble members, which are bigger than 0.1 m3/s, will get the rank of 99 for this case with the whole climatology being below 0.1 m3/s.


Raw climate values		0	0	0	0	0	0.01	0.09
Considered as 0 (yes/no)		1	1	1	1	0	0	0
Possible rank values	0	1	2	3	...	97	98	99
Rank description	<P1	P1<= <P2	P2<= <P3	P3<= <P4	...	P97<= <P98	P98<= <P99	P99<=


Raw values	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.01	0.06	1.9	2.5	5	11
Considered as 0 (yes/no)	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0
Ensemble member rank values	0	6	12	19	25	31	37	43	49	56	62	68	74	80	87	93	99	99	99	99	99

Table 2: Hypothetic example-2 of climate distribution and the related forecast ensemble distribution represented with only 21 members.

With this special treatment of the 0 singularity values, there is no need to mask areas on the new sub-seasonal and seasonal products. On the old version of the GloFAS seasonal forecasts, areas and time ranges, where the values were too small (effectively 0), were masked and the forecast values simply were not shown as 'undefined'.

For the new revised products, this special treatment will mean all of these 0 or near 0 cases will simply fall into an evenly distributed, very uncertain, on average near normal condition. Just as, following on from the 2nd example above, with no discharge in the climatology and no discharge in the forecast either, we will end up with a unified distribution of ranks from 0 to 99, evenly spanning the whole climatological range. Which means, if we average these ranks, we will get exactly the middle, the climate median, so no anomaly whatsoever.

This also means, for dry or very dry areas, there will never be a dry forecast anomaly, as that does not have any meaning (we can not go below 0), and therefore only neutral or positive anomalies are possible. The magnitude or severity of those positive anomalies then will be determined by the number and distribution of ensemble members being above the non-zero climate percentiles. If enough members will be above the non-zero climate percentiles and thus high enough fraction of the forecast will get high enough ranks, then on average the distribution of the 51 ensemble member ranks will show a pronounced enough shift from the neutral situation (i.e. when everything is 0 and the ranks are evenly distributed and the forecast end up looking totally normal).

For the real time forecasts, instead we will have a 51-value ensemble of ranks (-50 to 50). So, for defining the dominant category, we have options:
- Most populated: We either choose the most populated of the 7 categories. But this will be more problematic for very uncertain cases, when little shifts in the distribution could potentially mean large shift in the categories. For example assuming a distribution of 8/7/8/7/7/7/7 and 6/7/7/7/8/9/7. These two are very much possible for the longer ranges. It is a question anyway, which one to choose in the first example, Cat1 or Cat5, they are equally likely. But then, in the 2nd case we have Cat6 as winner. But the two cases are otherwise very similar.
- ENS-mean: Alternatively, we can rather use the ensemble mean, and rank the ensemble mean value and define the severity category that way. for the above example, this would not mean a big difference, as the ensemble mean is expected to be quite similar for both forecast distributions, so the two categories will also be either the same, or maybe only one apart. I think this is what we need to do!

Generation of the forecast anomaly and uncertainty signal

Climatological bins and anomaly categories

Forecast extremity rank computation

eh

Page tree

Placeholder Forecast anomaly and uncertainty computation methodology

Climatological bins and anomaly categories

Forecast extremity rank computation

Treatment of 0 values

Generation of the forecast anomaly and uncertainty signal

Climatological bins and anomaly categories

Forecast extremity rank computation