Selection of stations

GloFAS verification activities are based on GloFAS fixed reporting points with additional quality criteria applied:

  • A minimum length of observation record in the 1979-2022 reference period (regardless of gaps). For the calibration 4 years was used, while for the GloFAS v4 general hydrological model performance verification only 1 year.
  • Limited impact of lake, reservoir and human influence on river discharge for two station network selections in the general assessment page (GloFAS v4 general hydrological model performance) described below, while no restriction on lake or reservoir influence was used for the calibration (GloFAS v4 calibration methodology and parameters) and the full station list in the general performance page (GloFAS v4 general hydrological model performance) and the model performance web product (GloFAS hydrological model performance web product)
  • Exclusion of stations with poor observation quality problems (typically observations showing errors that could not be corrected, e.g. truncated peaks, etc.), or stations that could not be mapped onto the GloFAS river network with high enough confidence (i.e. the observed gauge could not be represented by GloFAS, for example because of too uncertain metadata, completely unrepresentative observations). This step was applied for all application, i.e the calibration or all types of verification

For reference, considering only the stations that do not have larger reservoir or lake influence, in v3.1 we had 1532 stations for the general hydrological model performance analysis. This number has increased to 1987 stations for v4.0 and 1949 stations that could be used (mapped) for both v4.0 and v3.1. The number is lower when both models are considered, as some stations could not be mapped in v3.1 due to the lower resolution (or other issues) of the river network. The generally lower number of 1532 originally used stations for the v3.1 analysis was partially due to the longer minimum observation length of 4 years, which was relaxed further to only 1 year for v4.0. Even though the scores will not be robust for very short availability period, which the users should be aware of, this way users can see the model behaviour with the flood thresholds and can have a first impression on the model performance. In addition, numbers also increased for v4.0, as extra stations were added to the GloFAS observation network since the v3.1 implementation in May 2021.

The station number of 1987, listed above, is lower than the 1995 stations used in the calibration (GloFAS v4 calibration methodology and parameters). The reason for this is that the calibration also considered good stations that had larger reservoir or lake influence.

The GloFAS v4 general hydrological model performance page shows three station networks. The largest set includes all stations that have at least the minimum 1-year good enough quality river discharge observations, regardless of the reservoir or lake influence, of which we have 2293. This station network was also used in the GloFAS hydrological model performance web product. In addition, another station network was used which included only the stations that were used in both v3 and v4 calibrations, and did not show large reservoir or lake influence. This list includes 996 stations. Then the third set was with stations that were not used in either v4 or v3 calibration, again without larger reservoir or lake influence. This includes 233 stations. The reason for the omission of stations with larger reservoir and lake influence is that these catchments are lot more difficult to model and sometimes improvements could come for the wrong reasons, which could make results comparing models, such as v4.0 vs v3.1, harder to interpret.

Verification period

This GloFAS v4.0 hydrological model performance assessments (calibration and general performance and the evaluation web product) are based on the historical river discharge reanalysis, available at https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-historical?tab=overview.

The verification focused on the whole reanalysis period with all available river discharge observations over the period of 1979-2021. Both the general performance analysis (GloFAS v4 general hydrological model performance) and the GloFAS hydrological model performance layer in the map viewer (GloFAS hydrological model performance web product) used this period

Performance scores

GloFAS hydrological performance verification is done against river discharge observations available to the GloFAS team. The hydrological model performance analysis was conducted based on the modified Kling–Gupta efficiency metric (KGE; ideal value is 1):

The three component scores of the KGE were also used:

  • Pearson correlation (r) in KGE highlights temporal errors through the strength of the linear relationship between simulation and observation time series. It ranges from -1 to 1, with 1 as ideal value.
  • Bias ratio represents the bias errors, ranging from 0 to +Inf, with 1 as ideal value. Also, the 0-centred version bias (ideal value = 0) defined as β-1, and its absolute value version absbias were used
  • Variability ratio shows the variability related errors in the simulation. It ranges from 0 to +Inf, with 1 as optimal value. Also, 0-centred var (ideal value = 0) defined as γ-1 and its absolute value version absvar were used

In all the GloFAS v4 calibration hydrological model performance, the GloFAS v4 general hydrological model performance and the GloFAS hydrological model performance web product, the KGE' and the three KGE' components (bias, variability ratios and correlation) were used. For the calibration model performance (GloFAS v4 calibration hydrological model performance), the original β and γ ratio errors were considered. However, in the general GloFAS v4.0 model evaluation (GloFAS v4 general hydrological model performance) and the evaluation web product (GloFAS hydrological model performance web product), besides the correlation (pcorr), the β-1 (bias) and γ-1 (var) were considered to represent the bias and variability errors. These were chosen instead of β and γ as they have 0 as optimal values instead of 1 (with a range from -1 to infinity), and thus the sign intuitively shows whether the bias/variability is negatively or positively erroneous. In addition, the absolute values of bias (absbias) and var (absvar) were also used, which are useful in model comparison and can show the magnitude difference of these errors.

Finally, a specific index was also used for measuring timing errors (timing in days; ideal value is 0), which shows the time delay between the simulated and observed river discharge time series (and also the absolute value abstiming). Timing is time lag (or shift) L that maximises Rxy(L), cross correlation function Rxy(m) with the simulated (x) and observed (y) time series shifted by L days. Positive/negative timing error indicates delayed/advanced simulated river discharge. So, for example a timing error of +5 means the simulation needs to be shifted by 5 days backwards (brought earlier) to get to the highest correlation, i.e. the simulation is generally 5-day late predicting the ups and downs in the flow time series. Although this is not directly equivalent to measuring the timing error of the flood peaks, it is in very good relation with that and can be used as a simple estimate.