Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Similar to the MSE, the BS can be decomposed into three terms, the most often quoted was suggested by Allan Murphy (1973, 1986) who used "binned" probabilities:

where nk is the number of forecasts of the same probability category k.

  • The first term measures the reliability (i.e. how much the forecast probabilities can be taken at face value).  On the reliability diagram this is the nk weighted sum of the distance (vertical or horizontal) between each point and the 45° diagonal (see Fig12.B.10).
  • The second term measures the resolution (i.e. how much the predicted probabilities differ from a climatological average and therefore contribute information).  On the reliability diagram this is the weighted sum of the distances to a horizontal line defined by the climatological probability reference (see Fig12.B.10).
  • The third term measures the uncertainty (i.e. the variance of the observations).  It takes its highest, most "uncertain", value when õ = 0.5 (see Fig12.B.11).

Fig12.B.10: Summary of Allan Murphy’’s Murphy's reliability and resolution terms.


Fig12.B.11: Uncertainty is at its maximum for a climatological observed probability average of 50%.


The Brier score is a "proper" score

The Brier score is strictly "proper" (i.e. it encourages forecasters to really try to find out the probability, without thinking about whether the forecast value is "tactical" or not).  Indeed, if forecasters deviate from their true beliefs, the BS will "punish" them!  This sounds strange.  How can an abstract mathematical equation know the inner beliefs of someone?

Assume forecasters honestly think the probability of an event is p but have, for misguided "tactical" reasons, instead stated r.  If the event occurs, the contribution to the BS (first term) is (1 - r)2  weighted by the probability for the outcome to occur.  If the event does not occur the contribution to the BS (first term) is (r - 0)2 weighted by the probability for the outcome not to occur.  For these weightings, the "honest" probability must be used; p when the event occurs, (1 - p) when the event does not occur.  This is where the forecaster’’s true beliefs are revealed!

The expected contribution to the BS is therefore:

Differentiating with respect to r yields

with a minimum for r = p. Therefore to minimize the expected contribution to the Brier Score the honestly believed probability value should be used.

The Brier Skill Score (BSS)

A Brier Skill Score (BSS) is conventionally defined as the relative probability score compared with the probability score of a reference forecast.

"Uncertainty" plays no role in the BSS.


Rank Probability Scores (RPS)

Probabilities often refer to the risk that some threshold might be exceeded, for example that the precipitation >1 mm/12hr or that the wind >15 m/s.  However, when evaluating a probabilistic system, there are no reasons why these thresholds are particularly significant.  For the Rank Probability Score (RPS) the BS is calculated for different (one-sided) discreet thresholds and then averaged over all thresholds.  

...

MOS also improves forecasts in the medium range but, with increasing forecast range, less and less of this improvement is due to the ability of MOS equation’’s ability to remove systematic errors.  In the medium range, the dominant errors are non-systematic.  These non-systematic errors (e.g. false model climate drift) can appear as false systematic errors (e.g. see Fig12.A.4).  They will thus be "corrected" by the MOS in the same way as true systematic errors.  By this means MOS is essentially dampening the forecast anomalies and thereby minimizing the RMSE.  This might be justified in a purely deterministic context but not in an ensemble context, where the most skilful damping of less predictable anomalies is achieved by ensemble averaging through the EM.  It is therefore recommended that MOS equations are calculated in the short range, typically at D+1, based on forecasts from the CTRL, and then applied to all the members in the ensemble throughout the whole forecast range, as long as any genuine model drift can be discarded.

...