Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Using Verification Metrics alongside the Forecast

It is useful to have some measures of the current performance and biases of the IFS.  Users can assess from Reliability and ROC diagrams whether the forecast model is:

...

ECMWF provides a number of verification metrics to use in this way, such as anomaly correlation coefficients, reliability diagrams and ROC curves, which have all been computed using the re-forecasts.


Brier Score

Brier Score (BS) is a measure, over a large sample, of the correspondence between the each forecast probability against the frequency of occurrence of the verifying observations (e.g. on average, when rain is forecasted with probability p, it should occur with the same frequency p).  Observation frequency is plotted against forecast probability as a graph.  A perfect correspondence means the graph will lie upon the diagonal; the area between the graph and the diagonal measures the Brier Score - values lie between 0 (perfect) and 1 (consistently wrong).

Distribution of forecast probabilities

The distribution of forecast probabilities gives an indication of the tendency of the forecast towards uncertainty.  These are plotted as a histogram to give an indication of confidence in model performance:

...

Note where there are only a few entries for a given probability on the histogram then confidence in the Reliability diagram is reduced for that probability.  Thus in Fig1 Fig8.4-1 the predominance of probabilities below 0.2 and above 0.9 suggests there can be some confidence that when predicting lower tercile climatological temperatures at 2m, IFS tends to be over confident that the event will occur and under confident that it won't.  However, there are few probabilities on the histogram between 0.2 and 0.9 which suggests that it would be unsafe to confidently draw similar deductions from the Reliability diagram  within this probability range.  Conversely, in Fig2 Fig8.4-2 the majority of probabilities lie between 0.2 and 0.5 and reliability within this range appears fairly good while there is much less confidence in model performance for over- or under-forecasting an event.  This is as expected as the forecast range becomes longer.

The Reliability diagram

The reliability diagram gives a measure of the capacity to discriminate between model over- or under-forecasting.

...

A common feature of reliability diagrams is that the profile of the forecasts (red line in Fig1Fig8.4-1) has a shallower slope than the diagonal, but crosses it somewhere near the climatological value (blue line intersection).  This means that the forecast has a tendency to be over-confident.  Users should adjust forecast probabilities, even if departures from the diagonal are only small, to offset the tendency to over-forecast frequently observed events and to under-forecast rather more rare events.

Current Reliability Diagrams (which include distribution of forecast probabilities) are available on Opencharts (days 4, 6, and 10 only)

The ROC diagram

The ROC diagram gives a measure of the capacity to discriminate when events are more likely to happen.  It shows the effectiveness of the IFS in forecasting an event that actually happens (Probability of Detection or Hit Rate) while balancing this against the undesirable cases of predicting an event that fails to occur (False Alarm Rate).  The effectiveness is also known as the 'resolution' of the forecasting system (not to be confused with spatial and temporal resolution).

A system which always forecast climatological probabilities, for example, would have no discrimination ability (i.e. zero resolution).  The resolution can be investigated using the Relative Operating Characteristic (ROC) diagram, which plots hit rate on the y-axis against false alarm rate on the x-axis.   Ideally the Hit Rate should be high and the False Alarm Rate low (i.e. ideally the graph should lie well towards the top left corner) and generally the Hit Rate should be better than the False Alarm Rate (i.e. values should lie above the diagonal).  

...

The ROC score is the area beneath the graph on the ROC diagram and lies between 1 (perfect capture of events) and 0 (consistently warning of events that don't happen).  Fig1  Fig8.4-1 shows high effectiveness in forecasting events (ROC score 0.859) while Fig2 Fig8.4-2 shows reduced effectiveness (ROC score 0.593).  This is as expected as the forecast range becomes longer. 

Current ROC Diagrams are available on Opencharts (for day5 onwards).



Fig1Fig8.4-1: Reliability Diagram (left) and ROC diagram (right) regarding lower tercile for T2m in Europe area for week1 (day5-11), DT:20 Jun 2019.

 

Fig2Fig8.4-2Reliability Diagram (left) and ROC diagram (right) regarding lower tercile for T2m in Europe area for week5 (day19-32), DT:20 Jun 2019.

...

  • BrSc=Brier Score (BS),  LCBrSkSc = Brier Skill Score (BSS)., 
  • BS_REL = Forecast reliability and BS_RSL = Forecast resolution with respect to observations.
  • BSS_RSL = Forecast resolution and, BSS_REL = Forecast reliability with respect to climatology.


 

Fig3Fig8.4-3: Example of Reliability Diagrams from Opencharts.  Total 24hr precipitation Day6, assessed from ensemble probability forecasts during a three month period and compared climatology from the same period. The traces show the comparison of forecast probabilities against observed occurrences for 24h precipitation totals of >1mm, >5mm, >10mm, >20mm.  Ideally the traces should lie along the dashed blue line (i.e. the ensemble probability forecast should agree with the observed frequency).  The diagram shows:

  • reasonably good forecasting at low ensemble probabilities
    • e.g. ensemble 20% probability occurred on 20% of the time for each group
  • over-forecasting at higher ensemble probabilities:
    • e.g. ensemble 90% probability of >1mm/24h actually occurred only 60% of the time - the wide distribution of forecast probabilities suggest some confidence in the Reliability trace. 
    • e.g. ensemble 90% probability of >20mm/24h actually occurred 80% of the time - but the very few forecasts of high probabilities suggest very low confidence in the corresponding implied reliabilities.


Fig4Fig8.4-4: Example of Reliability Diagrams from Opencharts.  Temperature anomaly Day4, assessed from ensemble probability forecasts during a three month period and compared climatology from the same period. The traces show the comparison of forecast probabilities of anomalies against observed occurrences of anomalies for 2metre temperature of >8°C below, >4°C below, >4°C above, >8°C above climatology.  Ideally the traces should lie along the dashed blue line (i.e. the ensemble forecast probability should agree with the observed frequency).  The diagram shows:

...

    • for >4°C above climatology ensemble 90% probability actually occurred 85% of the time - the wide distribution of forecast probabilities suggest some moderate confidence in the implied reliability. 
    • for >8°C below climatology ensemble 90% probability actually occurred 85% of the time - but the very few forecasts of high probabilities suggest very low confidence in the implied reliability


Fig3Fig8.4-5: Reliability diagrams for 2m temperature based on July starts of the long-range forecasts for months 4-6.

  • left for the tropics - a slight tendency towards over-confidence, more especially where forecasting that this event (warm anomalies) will happen.
  • right for Europe - a tendency towards over-confidence, though the sample size for high confidence forecasts is small, making the plot noisy.


Fig4Fig8.4-6: Reliability diagrams for rain based on July starts of the long-range forecasts for months 4-6:

  • left for the tropics - a tendency towards over-confidence.
  • right for Europe - forecast not reliable at all (so should not be used, unless there are exceptional circumstances that warrant an expectation of skill that is ordinarily not there).

 

Fig5Fig8.4-7: ROC diagrams for Europe based on July starts of the long-range forecasts for months 4-6:

  • left for temperatures in the upper tercile - the Hit Rate is slightly better than the False Alarm Rate indicating that the forecast system has some limited ability to discriminate occasions when warm events are likely from occasions when they are not. 
  • right for precipitation in the upper tercile - the Hit Rate and False Alarm Rate are similar throughout indicating that the seasonal forecast system has no ability to distinguish occasions when it will be wet from occasions when it will not.

Anomaly Correlation

Anomaly Correlation Coefficient (ACC) charts give an assessment of the skill of the forecast.  They show the correlation at all geographical locations in map form.

...

The seasonal model climate (S-M-climate) is based on re-forecasts spanning the last 20 years, which used the ERA-interim reanalysisre-analysis for their initialisation


Anomaly correlation coefficient (ACC) charts are produced for several parameters.  Each chart shows the skill of the forecast at each location for the given month and lead-time.  

...

Locations with correlation significantly (95% confidence level) different from zero are highlighted by dots.


 

Fig8.3.1.4-8: Anomaly Correlation Coefficient for 2m temperature for months 2-4 based on November runs.  High ACC (red) over the eastern Pacific suggests the seasonal model captures the amplitude of the variability of the 2m temperature quite well, Grey over Siberia suggests the model is no better than climatology (i.e. doesn't capture the variability); and cyan near Newfoundland suggests the seasonal model can be rather unreliable and misleading in this area.

...