Using Verification Metrics alongside the Forecast

It is useful to have some measures of the current performance and biases of the IFS.  Users can assess from Reliability and ROC diagrams whether the forecast model is:

It is vital that users understand, from the outset, the general characteristics of the model forecasts relative to the subsequent verifying observations (e.g. whether or not the model typically over-forecasts or under-forecasts certain types of outcome).  Users should then interpret any forecast signals accordingly.  Usually this will mean that they need to be wary of over-stating the significance of any such signals (that have historically been unreliable and/or unskilful).  Such a strategy should be applied at all lead times, within forecasting in general.  However it is particularly important for the longer lead forecasts (such as monthly and seasonal).

ECMWF provides a number of verification metrics to use in this way, such as anomaly correlation coefficients, reliability diagrams and ROC curves, which have all been computed using the re-forecasts.


Brier Score

Brier Score (BS) is a measure, over a large sample, of the correspondence between each forecast probability against the frequency of occurrence of the verifying observations.  On average, when rain is forecasted with probability p, it should occur with the same frequency p.   Observation frequency is plotted against forecast probability as a graph.  A perfect correspondence means the graph will lie upon the diagonal; the area between the graph and the diagonal measures the Brier Score.  Values lie between 0 (perfect) and 1 (consistently wrong).

Distribution of forecast probabilities

The distribution of forecast probabilities gives an indication of the tendency of the forecast towards uncertainty.  These are plotted as a histogram to give an indication of confidence in model performance:

Where there are only a few entries for a given probability on the histogram then confidence in the Reliability diagram is reduced for that probability.  In Fig8.3.5-1:

However, there are few probabilities on the histogram between 0.2 and 0.9 which suggests that it would be unsafe to confidently draw similar deductions from the Reliability diagram within this probability range.  Conversely, in Fig8.3.5-2 the majority of probabilities lie between 0.2 and 0.5 and reliability within this range appears fairly good while there is much less confidence in model performance for over- or under-forecasting an event.  This is as expected as the forecast range becomes longer.

The Reliability diagram

The reliability diagram gives a measure of the capacity to discriminate between model over- or under-forecasting.

The diagram shows, for a given event, the relationship between:

An example might be, the probability that a 2m temperature will be greater than 20C, plotted against the climatological frequency of that event.

Ideally points should lie on the diagonal.  If the plotted points lie:

The size of the departure from the diagonal indicates the magnitude of the over- or under-forecasting error.

A common feature of reliability diagrams is that the profile of the forecasts (red line in Fig8.3.5-1) has a shallower slope than the diagonal, but crosses it somewhere near the climatological value (blue line intersection).  This means that the forecast has a tendency to be over-confident.  Users should adjust forecast probabilities, even if departures from the diagonal are only small. This is to offset any model tendency to over-forecast frequently observed events, and to under-forecast rather more rare events.

Current Reliability Diagrams (which include distribution of forecast probabilities) are available on Opencharts (days 4, 6, and 10 only)

The ROC diagram

The ROC diagram gives a measure of the capacity to discriminate when events are more likely to happen.

It shows the effectiveness of the IFS:

The effectiveness is also known as the 'resolution' of the forecasting system.  The word 'resolution' here should not be confused with spatial and temporal resolution.

A system which always forecast climatological probabilities, for example, would have no discrimination ability (i.e. zero resolution).  The resolution can be investigated using the Relative Operating Characteristic (ROC) diagram.  The ROC plots hit rate on the y-axis against false alarm rate on the x-axis.   Ideally:

Where a ROC graph:

The ROC score is the area beneath the graph on the ROC diagram and lies between 1 (perfect capture of events) and 0 (consistently warning of events that don't happen).  Fig8.3.5-1 shows high effectiveness in forecasting events (ROC score 0.859) while Fig8.3.5-2 shows reduced effectiveness (ROC score 0.593).  This is as expected as the forecast range becomes longer. 

Current ROC Diagrams are available on Opencharts (for day5 onwards).



Fig8.3.5-1: Reliability Diagram (left) and ROC diagram (right) regarding lower tercile for T2m in Europe area for week1 (day5-11), DT:20 Jun 2019.

 

Fig8.3.5-2: Reliability Diagram (left) and ROC diagram (right) regarding lower tercile for T2m in Europe area for week5 (day19-32), DT:20 Jun 2019.

In the above diagrams:


 

Fig8.3.5-3: Example of Reliability Diagrams from Opencharts.  Total 24hr precipitation Day6, assessed from ensemble probability forecasts during a three month period and compared climatology from the same period. The traces show the comparison of forecast probabilities against observed occurrences for 24h precipitation totals of >1mm, >5mm, >10mm, >20mm.  Ideally the traces should lie along the dashed blue line (i.e. the ensemble probability forecast should agree with the observed frequency).  The diagram shows:


Fig8.3.5-4: Example of Reliability Diagrams from Opencharts.  Temperature anomaly Day4, assessed from ensemble probability forecasts during a three month period and compared climatology from the same period. The traces show the comparison of forecast probabilities of anomalies against observed occurrences of anomalies for 2metre temperature of >8°C below, >4°C below, >4°C above, >8°C above climatology.  Ideally the traces should lie along the dashed blue line (i.e. the ensemble forecast probability should agree with the observed frequency).  The diagram shows:

However:


Fig8.3.5-5: Example reliability diagrams for 2m temperature based on July starts of the seasonal forecasts for months 4-6. 

 

Fig8.3.5-6: Example reliability diagrams for rain based on July starts of the seasonal forecasts for months 4-6:

 

Fig8.3.5-7: Example ROC diagrams for Europe based on July starts of the seasonal forecasts for months 4-6:


Anomaly Correlation 

Anomaly Correlation Coefficient (ACC) charts give an assessment of the skill of the forecast.  They show the correlation at all geographical locations in map form.

At ECMWF the anomaly correlation coefficient (ACC) scores represent the spatial correlation between:


Seasonal products are available in chart form and the correlation is evaluated between:

The seasonal model climate (S-M-climate) is based on re-forecasts spanning the last 20 years, which used the ERA-interim re-analysis for their initialisation


Anomaly correlation coefficient (ACC) charts are produced for several parameters.  Each chart shows the skill of the forecast at each location for the given month and lead-time.  

Locations with correlation significantly (95% confidence level) different from zero are highlighted by dots.


Fig8.3.5-8: Anomaly Correlation Coefficient for 2m temperature for months 2-4 based on November runs of the seasonal model.  On the chart:



(FUG associated with Cy50r1)