Selection of stations
GloFAS verification activities are based on GloFAS diagnostic fixed reporting points with additional quality criteria applied. For example, GloFAS v3.1 hydrological performance assessment used the following criteria to select the 1532 stations for the analysis::
- A minimum length of observation record in the 1979-2022 At least 4 years of observation within the 1979-2019 reference period (regardless of gaps). For the calibration 4 years was used, while for the GloFAS v4 general hydrological model performance verification only 1 year.
- Limited impact of lake, reservoir and human influence on river discharge for two station network selections in the general assessment page (GloFAS v4 general hydrological model performance) described below, while no restriction on lake or reservoir influence was used for the calibration (GloFAS v4 calibration methodology and parameters) and the full station list in the general performance page (GloFAS v4 general hydrological model performance) and the model performance web product (GloFAS hydrological model performance web product)
- Exclusion of stations with poor observation quality problems (typically observations showing linear interpolation between values, outliers, errors that could not be corrected, e.g. truncated peaks, duplicated values, etc.)
Verification period
The verification focused on either on the whole period with available river discharge observations, as in the GloFAS hydrological model performance layer in the map viewer (GloFAS hydrological model performance web product), or focused specifically on the flood season to evaluate only the period of the year when floods are likely to happen. This shorter period was applied for the general GloFAS v3.1 hydrological model performance analysis (GloFAS v3.1 hydrological performance and GloFAS v3.1 hydrological performance comparison with GloFAS v2.1).
In the main flood season focused verification, the period was determined for each catchment specifically. It was centred on the maximum observed daily mean climatology (daily climate mean observed discharge; a time series of 365 values from 1 Jan to 31 Dec; leap days were not distinguished). The daily climate mean values were computed by applying a +- 10-day window. This way, in each year a maximum of 21 OBS discharge values were used in the climate sample (for the 1979-2019 period). So, dependent on the length of the observational record available, the sample size to compute the climate mean varied from at least 84 values (for the minimum 4 years) to over 400 if most of the 41 years had data in 1979-2019.
It is worth noting that this method only determines the verification period around the most prominent flood period. If there are different flood periods, or the flood season has multiple maxima, then it only focuses on the highest of them.
The verification period was then defined by the date when the daily climate mean OBS discharge decreased to 70% of the maximum value on either side of the climate peak. This was then extended by 21 days (3 weeks) on both ends to account for some of the variability across the years. The length of the period varied from about 60 days for those catchments which have a very short and well defined flood season, to the whole year in special catchments, where floods can almost equally likely happen in each part of the year and there is no period which would have a 30% outlying peak in the daily climate mean time series.
Figure 1 shows the month of the maximum observed discharge in the daily climatology (Figure 1). The tropical areas with southern Asia tend to have peak discharge in the late part of the year, while southern Africa, northern Australia, western Europe and also southern side of Amazonia during the first months of the year. On the other hand, in the higher latitudes of the Northern Hemisphere, in Russia, Scandinavia and Canada, the highest discharge is found usually in May-June.
The related verification period length shows quite large geographical variability in Figure 2. The shortest, most defined highest flood seasons happen often in the higher latitudes and over higher orographical areas, where the snowmelt season is over a shorter period of the year, usually in around May-June. Similarly, a more concentrated tropical rainy season can also lead to a shorter highest-flood-related verification period, like in India and south Africa and much of Australia.
Figure 1. Month of the maximum discharge value in the observed daily climate mean.
- , or stations that could not be mapped onto the GloFAS river network with high enough confidence (i.e. the observed gauge could not be represented by GloFAS, for example because of too uncertain metadata, completely unrepresentative observations). This step was applied for all application, i.e the calibration or all types of verification
For reference, considering only the stations that do not have larger reservoir or lake influence, in v3.1 we had 1532 stations for the general hydrological model performance analysis. This number has increased to 1987 stations for v4.0 and 1949 stations that could be used (mapped) for both v4.0 and v3.1. The number is lower when both models are considered, as some stations could not be mapped in v3.1 due to the lower resolution (or other issues) of the river network. The generally lower number of 1532 originally used stations for the v3.1 analysis was partially due to the longer minimum observation length of 4 years, which was relaxed further to only 1 year for v4.0. Even though the scores will not be robust for very short availability period, which the users should be aware of, this way users can see the model behaviour with the flood thresholds and can have a first impression on the model performance. In addition, numbers also increased for v4.0, as extra stations were added to the GloFAS observation network since the v3.1 implementation in May 2021.
The station number of 1987, listed above, is lower than the 1995 stations used in the calibration (GloFAS v4 calibration methodology and parameters). The reason for this is that the calibration also considered good stations that had larger reservoir or lake influence.
The GloFAS v4 general hydrological model performance page shows three station networks. The largest set includes all stations that have at least the minimum 1-year good enough quality river discharge observations, regardless of the reservoir or lake influence, of which we have 2293. This station network was also used in the GloFAS hydrological model performance web product. In addition, another station network was used which included only the stations that were used in both v3 and v4 calibrations, and did not show large reservoir or lake influence. This list includes 996 stations. Then the third set was with stations that were not used in either v4 or v3 calibration, again without larger reservoir or lake influence. This includes 233 stations. The reason for the omission of stations with larger reservoir and lake influence is that these catchments are lot more difficult to model and sometimes improvements could come for the wrong reasons, which could make results comparing models, such as v4.0 vs v3.1, harder to interpret.
Verification period
This GloFAS v4.0 hydrological model performance assessments (calibration and general performance and the evaluation web product) are based on the historical river discharge reanalysis, available at https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-historical?tab=overview.
The verification focused on the whole reanalysis period with all available river discharge observations over the period of 1979-2021. Both the general performance analysis (GloFAS v4 general hydrological model performance) and the GloFAS hydrological model performance layer in the map viewer (GloFAS hydrological model performance web product) used this period.
Figure 2. Verification period length defined for the highest maximum discharge value in the observed daily climate mean.
Performance scores
GloFAS hydrological performance verification is done against river discharge observations available to the GloFAS team. The hydrological model performance analysis was conducted based on the modified Kling–Gupta efficiency metric (KGE'; ideal value is 1):
The thhre three component scores of the KGE ' were also used:
- Pearson correlation (r) in KGE
- highlights temporal errors through the strength of the linear relationship between simulation and observation time series. It ranges from -1 to 1, with 1 as ideal value.
- Bias ratio represents the bias errors, ranging from
- 0 to +Inf, with
- 1 as ideal value.
- Also, the 0-centred version bias (ideal value = 0) defined as β-
- 1, and its absolute value
- version absbias were used
- Variability ratio shows the variability related errors in the simulation
- . It ranges from
- 0 to +Inf, with
- 1 as optimal value.
- Also, 0-centred var (ideal value = 0) defined
- as γ-1 and its absolute value version absvar
- were used
In all the GloFAS v4 calibration hydrological model performance, the GloFAS v4 general hydrological model performance and the GloFAS hydrological model performance web product, the KGE' and the three KGE' components (bias, variability ratios and correlation) were used.. For the calibration model performance (GloFAS v4 calibration hydrological model performance), the original β and γ ratio errors were considered. However, in the general GloFAS v3v4.1 0 model evaluation (GloFAS v3.1 v4 general hydrological performance and GloFAS v3.1 hydrological performance comparison with GloFAS v2.1model performance) and the evaluation web product (GloFAS hydrological model performance web product), besides the correlation (pcorr), the β-1 (pbiasbias) and γ-1 (var) were considered of to represent the bias and variability ratios, which has 1 errors. These were chosen instead of β and γ as they have 0 as optimal values instead of 01 (with a range from -1 to infinity), and thus the sign intuitively shows whether the bias/variability is negatively or positively erroneous. In addition, the absolute values of pbias bias (abspbiasabsbias) and var (absvar) were also used, which are useful in model comparison and can show the magnitude difference of these errors.
Finally, a specific index was also used for measuring timing errors (timing in days; ideal value is 0), which shows the time delay between the simulated and observed river discharge time series (and also the absolute value abstiming). Timing is time lag (or shift) L that maximises Rxy(L), cross correlation function Rxy(m) with the simulated (x) and observed (y) time series shifted by L days. Positive/negative timing error indicates delayed/advanced simulated river discharge. So, for example a timing error of +5 means the simulation needs to be shifted by 5 days backwards (brought earlier) to get to the highest correlation, i.e. the simulation is generally 5-day late predicting the ups and downs in the flow time series. Although this is not directly equivalent to measuring the timing error of the highest flood peaks, it is in very good relation with that and can be used as a simple estimate.