Last modified on Jun 23, 2025 10:25

Contributors: A. Troccoli (ICS), M. Borga (ICS/UNIPD), M. Zaramella (ICS), L. Lusito (ICS), S. Cordeddu (ICS), E. Restivo (ICS), G. Aldrigo (ICS), S. Strada (ICS), R. Bortolami (ICS), C. Zanetti (ICS), R. Ciceri (ICS), Y-M. Saint-Drenan (ARMINES), R. Amaro e Silva (ARMINES), M. Koivisto (DTU), B. Olsen (DTU), P. Kanellas (DTU).

Table of Contents

History of modifications

Issue	Date	Description of modification	Author
v1.0	16/06/2025	Final version	C3S

List of datasets covered by this document

Deliverable ID	Product title	Product type (CDR, ICDR)	C3S Version Number	Public Version Number	Delivery date
	Climate and energy related variables from the Pan-European Climate Database derived from reanalysis and climate projections v4.2	CDR	v1.0	v1.0	16/06/2025

Acronyms and abbreviations

Acronym/abbreviation	Definition
AM	Annual Maxima
AOI	Angle Of Incidence
API	Application Programming Interface
AR6	Sixth Assessment Report
ASCII	American Standard Code for Information Interchange
AWCM	AWI-CM-1-1-MR
BCCS	BCC-CSM2-MR
BHI	Beam Horizontal Irradiance
BIAS	Data that has been bias-adjusted
C3S	Copernicus Climate Change Service
CDFt	Cumulative Distribution Function transfer
CDO	Climate Data Operators
CDS	Climate Data Store
CITY	Level of data aggregation corresponding to specific city coordinates
CMIP6	Coupled Model Intercomparison Project (sixth phase)
CMR5	CMCC-CM2-SR5
CSP	Concentrated Solar Power
DHI	Diffuse Horizontal Irradiance
DMP	Data Management Plan
DNI	Direct Normal Irradiance
DTU	Technical University of Denmark
ECE3	EC-Earth3
ENTSO-E	European Network of Transmission System Operators for Electricity
ERAA	European Resource Adequacy Assessment
ESFG	Earth System Grid Federation
ESRI	Environmental Systems Research Institute
GCM	Global Climate Model
GHI	Global Horizonta Irradiance corresponding to the Surface solar radiation downwards in reanalysis and climate models
GMC	General Climate Model
GPU	Generation Per Unit
GTI	Global Tilted Irradiance
GWA2	Global Wind Atlas version 2
HOL	Hydropower open-loop pumped storage inflow energy
HP	Hydro Power
HPI	Hydropower run-of-river with pondage inflow energy
HPO	Hydropower run-of-river with pondage generation energy
HPS	Hydro Pumped Storage
HRG	Hydropower reservoirs generation energy
HRI	Hydropower reservoirs inflow energy
HRO	Hydropower run-of-river generation energy
HRR	Hydropower run-of-river inflow energy
HWS	High Wind Speed
IC	Installed Capacity
IPCC	Intergovernmental Panel on Climate Change
Kt	Clearness index
LOYO	Leave-One-Year-Out
MAE	Mean Absolute Error
MEHR	MPI-ESM1-2-HR
MRM2	MRI-ESM2-0
NDA	Non-Disclosure Agreement
NNSE	Normalized Nash-Sutcliffe Efficiency
NSE	Nash-Sutcliffe Efficiency
NUT0	Country level of aggregation
NUT2	Sub Country/Provinces level of aggregation
ORIG	Data that have not been bias-adjusted
PECD	Pan-European Climate Database
PEOF	Pan-European Bidding Zones Offshore level of aggregation
PEON	Pan-European Bidding Zones Onshore level of aggregation
POA	Plane Of Array
PV	Photo Voltaic
QGIS	Quantum Geographic Information System
RF	Random Forest
ReGr	Resource Grade
SEDAC	Socioeconomic Data and Applications Center
SFOE	Swiss Federal Office of Energy
SPV	Solar Photovoltaic
SSPs	Shared Socio-economic Pathways
SZA	Solar Zenith Angle
SZOF	Pan-European Zones Offshore level of aggregation
SZON	Pan-European Zones Onshore level of aggregation
TA	2m temperature
TAW	Population-weighted temperature
TOA	Top Of the Atmosphere
TP	Total precipitation
TSO	Transmission System Operator
UTC	Coordinated Universal Time
VM	Virtual Machine
WMO	World Meteorological Organization
WOF	Wind power offshore
WON	Wind power onshore
WPP	Wind Power Plant
WS10	10m wind speed
WS100	100m wind speed

Introduction

This document presents the technical methodologies and implementation details of the climate and energy indicators included in the Pan-European Climate Database version 4.2 (PECDv4.2). Developed under the Copernicus Climate Change Service (C3S) Energy service, PECDv4.2 has been produced in close collaboration with the European Network of Transmission System Operators for Electricity (ENTSO-E).

PECDv4.2 builds on the previous version (v4.1) by introducing several important updates and methodological enhancements, aimed at improving the accuracy, usability, and relevance of the dataset for energy and climate applications across Europe.

The database is organised around two main data streams:

The historical stream, which provides observational and reanalysis-based climate data and derived energy indicators.
The climate projections stream, which includes downscaled and bias-adjusted climate simulations to support long-term planning and scenario analysis.

The technical documentation that follows is organised by stream—historical and projections—and describes each modelling workflow in terms of inputs, outputs, and processing steps. The historical stream is updated monthly, ensuring that the most recent data is progressively incorporated.

Files are provided in both NetCDF and CSV formats. For details on the format used for each variable, refer to Table 2.2, Table 2.12, Table 3.4 and Table 3.6.

Descriptions of file naming conventions can be found in Table 4.1, while Table 4.2 and Table 4.3 detail the ancillary NetCDF datasets available via the "Weights and masks" widget.

Please note: this documentation refers exclusively to PECDv4.2. The previous version, PECDv4.1, has been discontinued and will not be extended beyond 2021, as its datasets were frozen ahead of the 2023 European Resource Adequacy Assessment (ERAA), in agreement with ENTSO-E.

0.1. Key updates introduced in PECDv4.2

An overview of all the changes and updates that have been implemented in PECDv4.2 compared to PECDv4.1 can be found at the following page:
Climate and energy related variables from the Pan-European Climate Database versions comparison

1. Workflows

The workflows form the backbone of the PECDv4.2 system, integrating all key components of the data processing chain. Separate workflows have been developed for the two data streams – historical (Figure 1.1) and projections (Figure 1.2). Each workflow covers the generation of both climate and energy indicators, which serve as the foundation for data production, monitoring, and delivery.

The workflows are composed of sequential steps, beginning with the retrieval and processing of climate data. This typically includes bias adjustment (where applicable) and regridding to a standard spatial and temporal resolution. The next stage involves computing energy indicators using the climate data in combination with installed capacity and technical parameters, applying the conversion models developed under the C3S Energy service.

PECDv4.2 supports the modelling of energy generation from four key technologies: wind power, solar photovoltaic (SPV), concentrated solar power (CSP), and hydropower (HP). While the first three technologies use physical models, for hydropower a statistical model is employed to compute weekly energy-related indicators, including inflows to different hydropower types.

PECDv4.2 introduces several important updates over the previous version (PECDv4.1), including new data sources, improved methods (e.g., for wind bias adjustment), and additional aggregation levels. These updates are reflected in the updated workflows and described in detail throughout this documentation.

Figure 1.1: Workflow for the historical stream. All acronyms used in the workflow are listed in a dedicated section entitled "Acronyms and abbreviations" and located at the beginning of this documentation.

Figure 1.2: Workflow for the projection stream. All acronyms used in the workflow are listed in a dedicated section entitled "Acronyms and abbreviations" and located at the beginning of this documentation.

2. Historical stream

2.1. Data retrieval

The workflow illustrating the historical stream is shown in Figure 1.1. ERA5 data from the Copernicus Climate Data Store (CDS) is retrieved via the CDS API (Application Programming Interface), which requires prior installation of Python and the CDS API client. Data is downloaded in monthly chunks by specifying the desired variables and time period.

Each climate variable is retrieved at a spatial resolution of 0.25° × 0.25° and a temporal resolution of 1 hour, covering the period from 1950 to near-present, and constrained to the PECD domain.

The PECD domain is defined as a regular latitude–longitude grid at 0.25° resolution, extending from 18°N to 75°N and from 31°W to 45°E. This includes Europe, parts of North Africa, and western Asia.

The historical stream of PECDv4.2 includes the following climate indicators: 2 m temperature (TA), population-weighted temperature (TAW), total precipitation (TP), surface solar radiation downwards, 10 m wind speed (WS10) and 100 m wind speed (WS100). Detailed descriptions of these indicators are provided in Section 2.6. Notably, the surface solar radiation downwards corresponds to the global horizontal irradiance (GHI) and is downloaded in hourly values in J m⁻² and converted to W m⁻² by dividing by 3600 seconds.

For wind speed at 10 m, the \( u_{10} \) (zonal, east-west) and the \( v_{10} \) (meridional, north-south) wind components are retrieved and combined to compute wind speed \( ws_{10} \) as follows:

\[ ws_{10} = \sqrt{u_{10}^2 + v_{10}^2} \]

This calculation is implemented using a Python script. Additional guidance is available via the CDS documentation, e.g., ERA5: How to calculate wind speed and wind direction from u and v components of the wind?.

2.2. The power law for wind vertical extrapolation

Wind speed outputs from numerical weather prediction models and climate simulations are typically available at fixed vertical levels, most commonly at 10 m above ground level. For example, the CMIP6 climate projections only provide near-surface wind data. To estimate wind speeds at turbine-relevant heights (e.g., 100 m), vertical extrapolation is necessary. This is achieved using a power law, which expresses wind shear through a dimensionless coefficient known as Alpha (α).

This coefficient enables the conversion of 10 m wind speeds to other heights by accounting for localised vertical wind profiles, as represented in models like ERA5. Temporal variability in wind shear is also considered by stratifying Alpha values by time of day and month, ensuring more accurate height scaling for energy applications.

2.2.1. Alpha computation

The Alpha coefficient was derived using ERA5 wind data from the CDS, specifically the zonal (u) and meridional (v) wind components at both 10 m and 100 m heights, as described in Section 2.1.

The data span an 11-year period from 2011 to 2021, at hourly resolution. This time window was selected as it reflects the most recent and reliable observations assimilated in ERA5, and provides a statistically robust basis for calculating vertical wind shear.

For each time step, the wind speed magnitude at both heights was computed from the u and v components. Then, the Alpha coefficient was calculated at each grid point using the following logarithmic form of the power law:

\[ \alpha = \dfrac{\ln{v_{2}} - \ln{v_{1}}}{\ln{h_{2}} - \ln{h_{1}}} \]

where \( v_{2} \) and \( v_{1} \) [m s^-1] correspond to the wind speed at 100 m and 10 m, respectively; \( h_2 \) and \( h_1 \) [m] are the corresponding heights in meters.

The result is a set of Alpha values stratified across 24 hourly intervals and 12 months, capturing diurnal and seasonal variations in vertical wind shear. The final Alpha dataset is stored in NetCDF format and made available via the CDS. For more information, please refer to Table 4.2 and Table 4.3.

2.2.2. Alpha characterization

The diurnal and seasonal variability of the Alpha coefficient across the PECD domain is illustrated in Figure 2.1, which shows the mean Alpha value calculated for each hour and month. These results align with known atmospheric behaviour and prior studies: higher Alpha values occur during the colder, more stable night-time hours, whereas lower values are observed during the daytime, when the atmospheric boundary layer is typically well mixed. Similarly, during winter months, Alpha tends to be higher than in summer, particularly in the central (and warmer) hours of the day.

However, a more nuanced picture emerges when examining the spatial and temporal distribution of Alpha across the domain. Figure 2.2 presents box plots of Alpha values for each hour, aggregated over all grid points in the PECD domain. The plots highlight a wider interquartile range during night-time hours, indicating greater variability in wind shear under stable atmospheric conditions. Notably, the Alpha coefficient can also reach negative values (as low as -0.4), particularly at night, reflecting instances where wind speed decreases with height — a phenomenon associated with specific meteorological conditions.

The approach used in PECDv4.2, which stratifies Alpha by hour and month and computes it at the grid-point level, represents a significant improvement over the traditional assumption of a constant, spatially and temporally invariant Alpha. This enhancement supports more realistic vertical wind extrapolation, especially for wind energy applications.

Figure 2.1: Mean diurnal cycle of the Alpha wind shear coefficient, averaged across the PECD domain, shown for each month. Higher values are observed during the night-time and winter months due to more stable atmospheric conditions, while lower values occur during the day when vertical mixing is stronger.

Figure 2.2: Hourly distribution of the Alpha wind shear coefficient across the PECD domain, represented as box plots for each hour (UTC). Boxes show the interquartile range (25th–75th percentile), while whiskers and outliers highlight the spatial variability. Larger spreads at night reflect more variability under stable conditions.

2.3. The bias adjustment of the ERA5 wind speed

Bias adjustment refers to the process of statistically transforming climate model data to reduce systematic differences between a simulated climate and a reference dataset, usually based on observations, over the historical period. Bias adjustment has become a standard pre-processing step for climate impact studies to adjust climate model output that will drive application models, such as energy models. This is the case for wind speed, a key variable to derive wind power. Specifically, wind power computation depends non-linearly on wind speed (precisely, on its cube). Therefore, significant biases in wind speed can markedly affect the wind energy indicator.

In the framework of PECD and the production of energy indicators, stakeholders (ENTSO-E and its members) identified the bias adjustment of wind speed for the historical period as the main challenge.

Previous evaluations of ERA5 wind speed showed that ERA5 tends to underestimate the intensity of wind speed in most land areas in Europe, except in the North East, while it overestimates wind speed over the sea, particularly in the North Sea and along certain coastlines, such as Southern Norway or Portugal. For this reason, a bias adjustment of ERA5 wind fields is needed. Compared to PECDv4.1, in PECDv4.2 a new methodology was designed to bias-adjust ERA5 wind speeds using the Global Wind Atlas (Davis et al., 2023) as the reference dataset, which is presented in Section 2.3.1, and by applying the Delta Adjustment method, which is described in Section 2.3.2.

Before applying bias adjustment, preliminary corrections (see Section 2.3.3, pre-processing) are also performed on the ERA5 wind speed dataset to address known issues. The effectiveness of these corrections is monitored using four control boxes located in representative regions, as illustrated in Figure 2.3.

Figure 2.3 The location of the control boxes used to check some known issues in ERA5 wind speed. Blue: France (latitude: 45-47°N; longitude: 5-8°E); magenta: Germany (latitude: 50-53°N; longitude: 6-10°E); orange: Sweden (latitude: 57-61°N; longitude: 13-16°E); green: Finland (latitude: 60.5-63.5°N, longitude: 22.5-26.5°E).

2.3.1. The Global Wind Atlas

The Global Wind Atlas¹ is a free, web-based application designed to provide wind resource data with global coverage (Davis et al. 2023). The dataset is primarily intended for aggregation, upscaling analysis, and energy integration modelling, and it mainly supports policymakers, planners, and investors in identifying high-wind areas for wind power generation and in conducting preliminary calculations. By using microscale modelling, the Global Wind Atlas dataset accounts for very high-resolution topographic effects and captures small-scale wind speed variability, which is crucial to accurately estimate total wind resources.

The Global Wind Atlas dataset is created through a downscaling process that begins with large-scale wind climate data (for example, reanalysis) and ends with microscale wind climate data. The dataset combines information from mesoscale and microscale models, as well as from in situ observational sites, to provide refined and verified estimates of mean wind speed at relevant hub heights and at a high horizontal resolution. The WAsP software (Floors and Nielsen, 2019) performs the downscaling and lastly computes local wind climates every 250 m at five heights (10 m, 50 m, 100 m, 150 m, and 200 m) all over the globe, excluding the North and South Poles and offshore areas beyond 20 km. In PECDv4.2, the Global Wind Atlas version 2 (GWA2), which relies on the ERA-Interim reanalysis as input data, was used to bias-adjust the ERA5 wind speeds.

2.3.2. The Delta Adjustment method

To reduce biases in climate models, different bias-adjustment methodologies exist. To adjust the ERA5 wind speed at 10 m and 100 m height, the Delta Adjustment method was selected. This method is one of the simplest and least computationally demanding that applies a constant correction based on the difference between the mean values of the model output (source) and the reference data (target) over a defined historical period (Navarro-Racines et al., 2020). By only accounting for changes in the mean of the quantity of interest, the Delta Adjustment method inherently assumes that the only relevant bias is related to the mean of the distribution. For this reason, the Delta Adjustment is typically used for variables that do not exhibit a strong climate-change-related trend, which is, in general, the case for wind speed.

The Delta Adjustment method was applied to ERA5 wind speeds using the GWA2-derived Delta change factors that correspond, in each grid cell, to the ratio between the mean GWA2 and ERA5 wind speeds over the selected reference period (2006-2018). This scaling ensures that ERA5 wind speed mirrors terrain effects captured by GWA2, while maintaining its spatial-temporal consistency. Specifically, since GWA2 only provides the mean wind speed at each grid cell, the bias adjustment does not modify the diurnal cycle of the original ERA5 data. The resulting bias-adjusted wind speed dataset was then extended backwards and forward to cover the whole ERA5 period, 1950–near present.

2.3.3. Bias-adjustment procedure

The bias-adjustment procedure applied to the ERA5 WS10 and WS100 above-ground involves two steps, detailed below and summarised in Figure 2.4 and Figure 2.5.

Figure 2.4: Bias adjustment logic blocks for ERA5 historical wind speed (WS).

Figure 2.5: Details of the bias-adjustment logic block for ERA5 historical wind speed (WS).

1) Pre-processing:

Regarding ERA5 reanalysis, the wind speed at 10 m above ground is corrected to remove some known bugs: high-speed values (above 70 m s^-1) and the drop in wind speed at 10:00 UTC. High-speed winds are limited to the fixed threshold of 70 m s^-1. Regarding the drop in wind speed at 10:00 UTC, this is a known issue of the ERA5 reanalysis² that was fixed in PECD at the grid-point level by re-computing the 10:00 UTC value through the linear interpolation between the 9:00 UTC and the 11:00 UTC values (equivalent to a temporal average) using the 'interp’ function (method = ‘linear’) included in the xarray Python library. Considering the four geographical control boxes illustrated in Figure 2.3, Figure 2.6 shows the original and corrected mean diurnal cycles of the 10 m wind speed computed over the period 2009-2018.

Figure 2.6: Effect of the correction of the 10:00 UTC drop in WS10 in each of the four geographical control boxes shown in Figure 2.3. Blue line: original dataset, orange line: corrected dataset.

Regarding GWA2, the Global Wind Atlas wind speeds were selected at the same heights of ERA5 wind speeds (namely, 10 m and 100 m), then averaged over the period 2006-2018, and finally interpolated from their original (250 m) up to the ERA5 horizontal resolution (0.25°) using the 'coarsen' function inlcuded in the xarray Python library. The resulting NetCDF files, containing the mean wind speed of GWA2 and ERA5 at both 10 and 100 m, are described in Table 4.3 and are available for the download on the CDS.

2) Bias Adjustment: Following the methodology described in Section 2.3.2 and summarised in Figure 2.5, the ERA5 WS10 and WS100 were corrected using GWA2 as the reference (also called target) dataset.

2.3.4. Validation of the ERA5 bias-adjusted near-surface wind speeds

Despite the limited availability of long-term and homogeneous wind observations to assess wind fields (Davidson and Millstein, 2022), over Europe the E-OBS dataset³ offers land-only, station-based, daily means of near-surface (at 10 m above ground) wind speed, at the same horizontal resolution of ERA5 (0.25°) and over the period 1980-2022 (de Baar et al., 2023). The domain of the E-OBS dataset partly overlaps the PECD domain, providing a common area that stretches between the following coordinates: latitudes from 30°N to 72°N, and longitudes from 12°W to 40°E. Using the E-OBS observational gridded dataset as a reference for assessment, the ERA5 bias-adjusted near-surface wind speeds were evaluated.

Figure 2.7 illustrates the spatial distribution of the absolute bias in global means of near-surface (10 m) wind speed computed over the period 1995-2014. The absolute bias corresponds to the difference between the ERA5 reanalysis (before, ERA5_ORIG, and after, ERA5_BA, the bias adjustment) and the E-OBS dataset. The bias adjustment reduces the mean bias between E-OBS and ERA5 wind speeds, with the mean absolute bias moving from 0.55 m s^-1 (mean relative bias: 23.76%) to 0.41 m s^-1 (19.41%). The effect of bias adjustment is stronger over north-eastern Europe, where the bias reduces by nearly 1 m s^-1, with a final bias lower than 0.5 m s^-1(first vs. second plot in Figure 2.7). Instead, over mountainous regions (for example, the Alps or the Carpathian Mountains) and areas with complex terrain mixing steep slopes and coasts (for example, Norway or the Balkans), the bias increases once wind fields have been bias-adjusted (first vs. second plot in Figure 2.7). Over these regions, the bias in ERA5 bias-adjusted wind speeds shows a similar pattern to the difference between GWA2 and E-OBS, suggesting that over complex terrains, ERA5 inherits the micro-scale information from GWA2 that E-OBS does not provide (second vs. third plot in Figure 2.7).

Figure 2.7: Spatial distribution of the global absolute bias in near-surface wind speed (units: m s^-1) using the E-OBS (EOBS) observational dataset as reference, from left to right: (a) the original ERA5 reanalysis (ERA5_ORIG), (b) the Bias-Adjusted ERA5 reanalysis (ERA5_BA), and (c) the Global Wind Atlas version 2 (GWA2). All datasets have been averaged in time over the period 1995-2014. Data are shown over a domain that results from the intersection between the PECD and the E-OBS domains (coordinates: latitudes [30°N-72°N] and longitudes [12°W-40°E]). Grey colour depicts areas where the E-OBS data are not available. The purple and blue boxes highlight two regions that have been chosen for further checks over: (a) Germany (latitudes [50°N-53°N] and longitudes [8°E-12°E]) and (b) Finland (latitudes [64°N-68°N] and longitudes [26°E-30°E]).

Looking at the temporal correspondence between ERA5 and E-OBS, Figure 2.8 shows the time series of monthly mean wind speeds computed over the period 1995-2014 and over the European domain presented in Figure 2.7. The original ERA5 already captures the temporal variations in wind speed, including the succession of high and low values. The bias adjustment improves the temporal correlation, with the square of the Pearson's coefficient (R²) increasing from 0.67 to 0.72⁴ , and brings ERA5 closer to EOBS, with the mean bias decreasing from 0.51 to 0.41 m s^-1(Figure 2.8). Moving to the regional scales, Figure 2.9 shows the time series of monthly means and confirms that the bias adjustment has a stronger effect over north-eastern Europe. Over Germany, the mean bias between EOBS and ERA5 shows a similar absolute value before and after the bias adjustment (0.17 m s^-1 before and -0.19 m s^-1 after), while over Finland the mean bias decreases from 0.86 m s^-1 to 0.01 m s^-1. Moreover, over Germany ERA5 shows a higher temporal correlation with E-OBS (R² = 0.98) compared to Finland, where some years perform worse than others (R² = 0.35). This is the case for the year 2010, which is highlighted in yellow on Figure 2.9. For this year, Figure 2.10 shows the tight temporal correspondence between E-OBS and ERA5 daily means over Germany, while some discrepancies appear over Finland.

Figure 2.8: Time series of monthly means of near-surface wind speeds (units: m s^-1) computed over the period 1995-2014 and over the European domain illustrated in Figure 2.7. The five solid lines show: (a) the E-OBS dataset (EOBS, green line), (b) the original ERA5 reanalysis (ERA5_ORIG, red), (c) the bias-adjusted ERA5 reanalysis (ERA5_BA, blue), (d) the difference between ERA5_ORIG and EOBS (grey), and (e) the difference between ERA5_BA and EOBS (pink).

Figure 2.9: As Figure 2.8 for two regions located in: (a) Germany (latitudes [50°N-53°N] and longitudes [8°E-12° E], purple box on Figure 2.7; left plot) and (b) Finland (latitudes [64°N-68°N] and longitudes [26°E-30°E]; blue box on Figure 2.7; right plot). The year 2010 is highlighted with a yellow stripe and has been chosen to illustrate the time series of daily means.

Figure 2.10: Time-series of daily means of near-surface wind speeds (units: m s-¹) for the year 2010 computed over two regions located in: (a) Germany (latitudes [50°N-53°N] and longitudes [8°E-12° E], purple box on Figure 2.7; top plot) and (b) Finland (latitudes [64°N-68°N] and longitudes [26°E-30°E]; blue box on Figure 2.7; bottom plot). The five solid lines show: (a) the E-OBS dataset (EOBS, green line), (b) the original ERA5 reanalysis (ERA5_ORIG, red), (c) the bias-adjusted ERA5 reanalysis (ERA5_BA, blue), (d) the difference between ERA5_ORIG and EOBS (grey), and (e) the difference between ERA5_BA and EOBS (pink).

https://globalwindatlas.info/en/
ERA5: data documentation#Knownissues, point 8.
https://surfobs.climate.copernicus.eu/dataaccess/access_eobs.php
In simple linear regression, the square of the correlation coefficient, R², goes from 0 to 1, with higher values indicating a better fit of the model to the observations.

2.4. Population-weighted Temperature

Population-weighted temperature (TAW) is an important climate indicator included in the PECDv4.2 database. It is particularly relevant for energy conversion and demand modelling, as it provides a temperature metric that reflects the conditions most likely experienced by the population. Rather than averaging temperature uniformly across a region, TAW gives greater weight to areas with higher population density, offering a more realistic estimate of population exposure to temperature variations.

In the PECD framework, TAW is calculated exclusively at the SZON (onshore bidding zones) aggregation level (see Table 2.1 for a full list of spatial aggregation levels and their acronyms). This approach allows for a consistent integration of TAW into energy-related applications, such as forecasting demand peaks during heatwaves or cold spells, assessing vulnerability, or planning adaptive infrastructure and policy interventions.

2.4.1. Population mask

To calculate TAW, a high-resolution population mask is required. For PECDv4.2, gridded population data at 0.25° spatial resolution were sourced from the NASA Socioeconomic Data and Applications Center (SEDAC)⁵ , based on the latest available dataset (year 2020). The data were obtained in ASCII format and include the number of inhabitants per cell across the domain.

The population raster was clipped to the PECD domain and converted to NetCDF format (Figure 2.11), using QGIS-GRASS GIS (Geographic Information System, Open-Source Geospatial Foundation Project⁶ ). Sea and ocean areas were assigned missing values in accordance with the ESRI ASCII specification. The resulting NetCDF population mask is used throughout the modelling chain and is available for download via the CDS (please refer to Table 4.2 and Table 4.3 for more details).

Figure 2.11: Population distribution across the PECD domain based on NASA SEDAC data (2020), mapped at 0.25° resolution. Values represent the number of inhabitants per grid cell.

2.4.2. Computation of Population-weighted temperature

TAW [°C] is computed by applying the population mask to the gridded TA, both at 0.25° resolution. The calculation is carried out independently for each onshore bidding zone (referring to the aggregation level SZON, as detailed in Table 2.1), using the following equation:

\[ TAW_z = \frac{\sum_{i=1}^n T_iP_i}{\sum_{i=0}^n P_i} \]

where \( T_i \) is the air temperature and \( P_i \) is the population in the i-th grid cell of zone , and is the number of grid cells in the zone. This results in a weighted average temperature for each zone, reflecting human exposure rather than geographic extent alone.

Figure 2.12 shows the difference between the mean TAW and the mean TA over the period 1991-2020 across the SZON regions.

Figure 2.12: Difference between the mean TAW and TA over the climatology 1991-2020 for SZON regions.

2.5. Spatial aggregation

Spatial aggregation is the procedure used to compute regionally averaged indicators from gridded climate and energy data. It enables the transformation of high-resolution outputs into meaningful statistics for specific administrative or market-related regions, such as countries, provinces, or bidding zones. This process is systematically applied to all gridded indicators in PECD to produce corresponding aggregated versions.

Please note that on CDS, the sub-region selection is only available for gridded datasets. When downloading aggregated time series from CDS, the sub-regional extraction is not supported.

2.5.1. Required spatial aggregation level for PECDv4.2

The PECD database supports multiple levels of spatial aggregation, depending on the needs of climate and energy modelling. Table 2.1 below summarises these levels, along with their codes and source definitions.

For country-level and sub-country-level aggregations, a combination of NUTS (Nomenclature of Territorial Units for Statistics)⁷ and ADMIN (Natural Earth Global Administrative Boundaries)⁸ shapefiles was used. For market-relevant zones such as bidding zones⁹ and pan-European regions, official shapefiles were provided by ENTSO-E. Figure 2.13 shows some of the shapefiles used to create the masks.

In PECDv4.2, a new spatial reference level has been introduced — CITY — available exclusively for the 2 m temperature (TA) variable. It consists of a pre-defined selection of cities for which temperature time series are provided. These values are not the result of a spatial aggregation, but are directly extracted from the grid cells closest to each city's coordinates. The values are provided in CSV format, enabling simplified access and analysis of urban temperature trends. The list of cities is fixed, cannot be customised by the user, and was defined by ENTSO-E following a specific request from MedTSOs.

Table 2.1: Overview of the spatial aggregation levels available in PECD. The level introduced in PECDv4.2 in comparison with previous versions is shown in bold.
Each code corresponds to a specific aggregation level, its description, and the source of the geographical definition.

Code	Description of the aggregation level	Source
ORIG	Not aggregated	Gridded data
BIAS	Not aggregated	Gridded data bias-adjusted (see Section 2.3)
NUT0	Country	NUTS0+ADMIN0
NUT2	Sub Country/Provinces	NUTS2+ADMIN1
SZON	Onshore Bidding Zones	Shapefile provided by ENTSO-E*
SZOF	Offshore Bidding Zones	Shapefile provided by ENTSO-E*
PEON	Pan-European Onshore Zones	Shapefile provided by ENTSO-E*
PEOF**	Pan-European Offshore Zones	Shapefile provided by ENTSO-E*
CITY	Not aggregated - List of selected cities (only for TA)	List provided by ENTSO-E

*These shapefiles are not publicly available, but the corresponding NetCDF masks are provided in the CDS under the widget "Weights and masks". Please see Table 4.2 and Table 4.3 for more details.

**In PECDv4.2, the PEOF zones were updated from previous versions by considering a new version of the shapefile (2024/09/19).

Figure 2.13: Example of original polygon geometries used to derive float masks for spatial aggregation.

https://sedac.ciesin.columbia.edu/
http://qgis.osgeo.org
https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics
https://www.naturalearthdata.com
Bidding Zones: Bidding zones are geographical areas within a power market where electricity is traded without internal transmission constraints, ensuring uniform pricing across the zone. These zones facilitate efficient market operations and price formation by balancing supply and demand, providing clear investment signals, and supporting cross-border electricity trade. Defined by transmission system operators, bidding zones help optimise the use of the transmission network and maintain market transparency.

2.5.2. Generation of Region Masks for Spatial Aggregation

To perform spatial aggregation, floating-point NetCDF masks were generated from the shapefiles listed in Table 2.1. One mask was created for each aggregation level, resulting in six region masks: NUT0, NUT2, PEON, PEOF, SZON, and SZOF.

The mask generation involves the following steps:

Shapefile simplification: All shapefiles were simplified using an angular distance tolerance of 0.1° to reduce computational complexity.
Raster initialisation: A base raster matching the PECD domain and the ERA5 resolution (0.25° × 0.25°) was loaded.
Polygon iteration:
- For each polygon in the shapefile:
  - Identify all raster grid cells intersecting the polygon.
  - For each intersecting cell:
    - Calculate the percentage of the cell’s area covered by the polygon (region of interest).
    - Assign this fractional value to the corresponding grid cell.
Output: The resulting fractional coverage values are saved in a NetCDF mask file for each region type.

These masks allow for accurate area-weighted aggregation, especially near borders and coastlines. An example for Italy (NUT0 level) is shown in Figure 2.14.

All regional masks are available for download from the CDS under the widget “Weights and masks”. Additional details about filenames and structure can be found in Table 4.2 and Table 4.3.

Figure 2.14: Example of a float mask, for the Italian NUT0 administrative region, showing the fractions of land around the border and coastlines.

2.5.3. Spatial Aggregation Procedure

The spatial aggregation of climate and energy indicators is implemented via a Python-based tool, following this workflow:
1. Input loading:
  - Load the NetCDF file containing the variable(s) to be aggregated.
  - Load the corresponding region mask (NetCDF format).
2. Grid iteration:
  - Iterate over the spatial coordinates defined in the region mask.
  - For each region:
    - Apply the region mask to the gridded data, applying a cosine latitude weighting to account for spatial distortion (see Table 4.2 and Table 4.3 for more details).
    - Compute the weighted average over the masked area.
3. Result formatting:
  - Store the aggregated values for each region in a column of a Pandas DataFrame.
  - Add the associated time axis from the NetCDF file to the DataFrame.
  - Export the final DataFrame as a CSV file.
  - Metadata is attached to the CSV file according to the conventions outlined in the Appendix.

2.6. Climate indicators

This section describes the climate indicators provided in PECDv4.2 for the historical stream. These indicators are derived from the ERA5 reanalysis and are used as inputs for energy modelling and climate analysis across the Pan-European domain.

In PECDv4.2, the coverage period of the historical stream has been extended: while in PECDv4.1 the data spanned from 1980 to 2021, the new version now provides data from 1950 to near-present, offering a longer and more robust historical baseline.

The indicators are available as both gridded products (NetCDF format, spatial resolution of 0.25° × 0.25°) and spatially aggregated time series (CSV format), depending on the level of aggregation.

Table 2.2 summarises the available climate indicators, including their temporal coverage, data source, domain and spatial resolution, temporal resolution, aggregation levels (see Table 2.1), and units. Notably, PECDv4.2 introduces a new reference level — CITY — which provides temperature time-series for a predefined list of cities (see Section 2.5).

Table 2.2: Climate indicators provided in the PECDv4.2 for the historical stream.
Gridded data (ORIG and BIAS levels) are provided in NetCDF format. All other aggregation levels are delivered in CSV format. Changes that were implemented in PECDv4.2 are highlighted in bold (extended time period and the CITY level).

Variable	Period	Source	Domain / Spatial Resolution	Temporal Resolution	Spatial Aggregation	Units
2m temperature (TA)	1950 - near present	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, SZOF, PEON, PEOF, CITY	K (gridded) °C (aggregated)
Population-weighted temperature (TAW)	1950 - near present	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	SZON	°C
Total precipitation (TP)	1950 - near present	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m
Surface solar radiation downwards (GHI)	1950 - near present	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, SZOF, PEON, PEOF	W m^-2
10m wind speed (WS10)	1950 - near present	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m s^-1
100m wind speed (WS100)	1950 - near present	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m s^-1

2.7. Energy data

In collaboration with ENTSO-E, extensive efforts have been made in PECDv4.2 to collect and integrate the widest possible range of energy-related data, which serve both for validating energy models and for training the hydro statistical model. The following sources have been used:

1) Wind power plant database: Position and technical characteristics of wind turbines across the PECD domain have been sourced from The Wind Power¹⁰ database. This includes information such as turbine capacity, hub height, and commissioning year.

2) Turbine-level power curves: Power curves are generated using the generic model available in PyWake¹¹ which builds a power curve based on turbine diameter and rated power.

3) ENTSO-E generation time series: Publicly available hourly generation data for wind, solar and hydro technologies have been retrieved from the ENTSO-E Transparency Platform¹² , and used to validate and train the statistical models.

4) Inflow data from PECDv3.1: Historical hydrological inflow time series from version PECDv3.1 have been used to complement or substitute official input data where gaps exist.

5) TSO-provided inflow data: Several Transmission System Operators (TSOs) have provided confidential data under non-disclosure agreements (NDAs). These data include high-resolution generation and storage series, and are not detailed in this documentation.

2.8. Exclusion areas

To ensure that wind and solar energy potential is realistically assessed, several exclusion layers have been applied in PECDv4.2. These masks identify areas unsuitable for energy production due to geographical, environmental, or legal constraints.

The wind power and solar photovoltaic power energy indicators have been computed and aggregated considering the following exclusion criteria:

Protected areas (e.g. nature reserves, conservation zones)
Polar regions
Urban areas (with high land cover density)
Oceans and inland water bodies
High-slope areas (≥ 60% slope)
High-elevation areas (≥ 2000 m a.s.l.)
Distance to shore constraints for offshore wind development

For offshore wind assessments, a distance buffer of 100 km from the coast was applied, except in the North Sea, where a 200 nautical mile buffer was used according to ENTSO-E guidance. These criteria allow for the modelling of technically feasible and regulatory-compliant deployment zones.

Combinations of exclusion layers were also considered in the modelling of wind and solar energy to better represent cumulative restrictions. The exclusion area masks were created accordingly and applied systematically in the energy modelling process.

All exclusion layers have been processed into NetCDF format and are available for download in the Climate Data Store (CDS) under the widget "Weights and masks". These include the "Wind power regions mask" and "Solar PV mask" used in PECDv4.2. For details on file naming conventions and characteristics, refer to Table 4.2 and Table 4.3.

Table 2.3 below provides a full overview of each exclusion criterion, including data sources and variable identifiers.

Table 2.3: Overview of exclusion criteria applied in PECDv4.2. Each row describes a specific constraint used to define areas unsuitable for wind or solar energy modelling. For each criterion, the table includes a brief description, the source of the underlying dataset, and the corresponding variable name used in the PECD files.

Criteria	Description	Source	Variable Name
Protected areas	Identifies legally protected regions, such as national parks or nature reserves. Values are binary: 1 indicates a restricted pixel.	World database on protected areas from the United Nations Environment Programme	prot_a
Polar areas	Identifies polar and subpolar regions based on global land cover classification. Values are binary: 1 indicates a restricted pixel.	Land Cover Classification System from the United Nations Food and Agriculture Organization	polar_a
Urban areas	Flags areas with urban coverage ≥ 45%. Values are binary: 1 indicates a restricted grid cell.	Land Cover Classification System from the United Nations Food and Agriculture Organization	urban_a
Water and continental waters area	Classifies water bodies using a three-value system: 0 = land, 1 = ocean, 2 = inland waters. Used to exclude non-land areas.	ERA5 Land-Sea Mask (ECMWF)	watr_a
High slope area	Identifies steep terrain where slope ≥ 60%. Values are binary: 1 indicates a restricted grid cell.	ETOPO1 Global Relief Model from National Oceanic and Atmospheric Administration (NOAA)	halo_a
High elevation areas	Identifies regions of high altitude. Values are binary: 1 indicates a restricted pixel.	ETOPO1 Global Relief Model from National Oceanic and Atmospheric Administration (NOAA)	hele_a
Distance to shore areas	Identifies areas beyond a given distance from the coastline (used primarily for offshore applications). Values are binary: 1 indicates a restricted pixel.	ERA5 Land-Sea Mask (ECMWF)	dist_s

2.9. Energy Conversion Models

This section outlines the methodologies and implementation details of the energy conversion models used in PECDv4.2, which replaces the previous PECDv4.1 version. These models convert meteorological inputs into power generation time series for four technologies: wind power, solar photovoltaic (SPV), concentrated solar power (CSP), and hydropower. The first three are physical models, while the hydropower module is based on a statistical machine learning approach.

For each energy model, we describe the input data sources, the modelling framework, and the calibration and validation methods used.

2.9.1. Wind Power Conversion Model

The wind power conversion model simulates generation at the wind power plant (WPP) level and aggregates results to the regional level. The conversion process differs between existing and future wind power installations, reflecting the evolution of wind technologies over time. Existing installations are modelled based on location, capacity, and technology data from WindPowerNet¹³ (considering the wind power plants that were in operation as of December 2020). Several wind technologies are simulated as potential future installations to model a range of specific power and hub height options.

In PECDv4.2, the model has undergone several updates compared to PECDv4.1, including the use of higher-resolution wind climatology data, a more flexible turbine modelling framework, and improved treatment of unavailability. These improvements enhance the realism and accuracy of the simulated wind generation time series.

2.9.1.1. Climate Data Handling

Wind speed data is sourced from the ERA5 reanalysis and CMIP6 climate model outputs, provided at 0.25° x 0.25° horizontal resolution and, when available, at two vertical levels: 10 m and 100 m above ground.

Spatial interpolation follows the method described in Murcia et al. (2022):

Horizontal interpolation is performed using cubic spline interpolation.
Vertical interpolation and extrapolation use a power-law approach, which is equivalent to a piecewise linear interpolation in log-log space between the two closest available heights.
Surface roughness is not explicitly modelled during interpolation. However, it is implicitly accounted for through the meteorological models, which internally consider roughness in the computation of wind speeds at different heights.

Interpolation is carried out at each time step and for each wind power plant (WPP) individually.

2.9.1.1.1. Wind bias adjustment in PECDv4.2

A key methodological improvement in PECDv4.2 is the use of the Global Wind Atlas version 2 (GWA2) for wind speed bias adjustment, replacing the COSMO-REA6 dataset used in PECDv4.1.

Wind speed time series (from both ERA5 and climate projections) are interpolated to each WPP location and then scaled to match the mean wind speed from GWA2.
GWA2 data, originally at 250 m resolution, is coarsened to 2.5 km to better align with the spatial footprint of WPP installations. This corresponds to ~6.3 km², equivalent to an estimated 43.8 MW at 7 MW/km² installation density.
This scaling is applied separately at both 10 m and 100 m heights, depending on data availability.
The adjustment is performed after interpolation, ensuring that the high-resolution mean wind climatology from GWA2 is preserved in the time series used for energy conversion.

This two-step process—interpolating ERA5/projection data and adjusting with GWA2 climatology—produces more realistic site-specific wind speed time series and improves alignment with observed power generation, in line with the findings of Murcia et al. (2022).

2.9.1.2. Conversion to Wind Power Generation

2.9.1.2.1. Existing installations

A power curve is estimated for each wind power plant (WPP) using a surrogate model, as detailed in Simutis et al. (2024). The model first constructs a turbine-level power curve from plant-level characteristics and then accounts for intra-farm wake effects to generate a plant-level curve for use in simulations.

This method enables the derivation of a specific power curve for each WPP. Comparisons with turbine-level data from the WindPowerNet database show good agreement, although the generic model excludes the storm shutdown regime (Figure 2.15), which is handled separately.

Figure 2.16 illustrates the surrogate modelling process and its supported parameter space, which covers current European installations and allows for a wide range of future configurations. Air density is fixed at 1.225 kg/m³. Turbulence intensity is set at 10% for onshore and 5% for offshore simulations.

The wind power conversion model codes are available in: https://gitlab.windenergy.dtu.dk/corres/corres

Figure 2.15: Comparison of the generic turbine-level power curves (surrogate) to two example power curves from the WindPowerNet (https://www.thewindpower.net) power curve database (actual). Note that the generic power curve model does not include the storm shutdown part (ws > 25), as this is handled separately in the simulation through a dedicated storm shutdown model.

Figure 2.16: Overview of the methodology for estimating a plant-level power curve for each WPP, and finally simulating the power generation time series (here for the historical period). Figure is taken from Simutis et al., 2024.

2.9.1.2.2. Future Installations

For future onshore wind installations, turbines with specific powers ranging from 198 to 335 W/m² are used, as indicated in Swisher et al. (2022). For offshore wind, turbines with specific powers of 316 and 370 W/m² are simulated. An overview of the simulated future wind technologies is given in Table 2.4 and Table 2.5, which also list the corresponding options found in the widget "Technological specification" in the download form. Each wind technology option is labelled with a number representing a specific combination of hub height (HH) and specific power (SP). For example, "21 (SP316 HH155)" refers to offshore wind power with a specific power of 316 W/m² and a hub height of 155 m. These labels allow users to easily select the desired wind turbine specification from the dataset.

Plant-level power curves for these future installations are estimated following the same methodology described for existing installations. Specific power and hub height are the main parameters influencing the resulting generation time series, while rotor diameter and rated power have a limited effect since results are expressed as standardised capacity factors (values between 0 and 1).

The power curve model, as presented in the previous section, is made available in the GitLab repository mentioned above. This allows users to generate plant-level power curves for any combination of specific power, hub height, and plant size, provided they fall within the supported range shown in Figure 2.16.

Table 2.4: Future technology of onshore wind turbines.

Specific Power [W/m²]	Rotor Diameter [m]	Hub Height [m]	Rated Power [MW]	Correspondent codes in the download form on CDS
199	152	100, 150, 200	5	31 (SP199 HH100) 32 (SP199 HH150) 33 (SP199 HH200)
277	129	100, 150, 200	5	34 (SP277 HH100) 35 (SP277 HH150) 36 (SP277 HH200)
335	117	100, 150, 200	5	37 (SP335 HH100) 38 (SP335 HH150) 39 (SP335 HH200)

Table 2.5: Future technology of offshore wind turbines.

Specific Power [W/m²]	Rotor Diameter [m]	Hub Height [m]	Rated Power [MW]	Correspondent codes in the download form on CDS
316	269	155	18	21 (SP316 HH155)
370	249	155	18	22 (SP370 HH155)

2.9.1.3. Storm Shutdown

Storm shutdown behaviour is modelled as described in Murcia et al. (2021), applying a direct (non-controlled) shutdown for all existing wind power plants (WPPs), using data from the WindPowerNet WPP installation database for the shutdown wind speeds. For future wind technologies, a 25 m/s cut-off is assumed for onshore wind installations, and the HWS (High Wind Speed) Deep type from Murcia et al. (2021) is used for future offshore wind installations (as in the PECD 2021 update). The shutdown procedure is modelled as a 'hysteresis,' where a restart occurs only after the wind speed has dropped to a sufficiently low value for a restart to take place (see Figure 2.17). The storm shutdown is a dynamic model that captures three aspects:

Individual wind turbine shutdown and restart as each turbine experiences wind speed fluctuations that can exceed 25 m/s (10-minute mean cut-off wind speed), depending on the duration of exceeding the limits, as illustrated in Figure 2.17.
Plant shutdown does not occur in the same manner as individual turbines; not all turbines in a plant shut down simultaneously, as each turbine experiences slightly different wind speeds at a given time.
The restart operation happens only at a somewhat lower wind speed than shutdown to prevent cycling between shutdown and restart when the wind speed hovers around the shutdown wind speed (e.g., 25 m/s). More details are provided in Murcia et al. (2021).

Figure 2.17: Single-turbine storm shutdown for two storm shutdown technologies. The different shutdown limits (up to 1 s) have been considered in detailed simulations, but a simplified plant-level behaviour (Murcia et al., 2021) is used for the simulations in this service. Figure taken from (Murcia et al., 2021).

https://www.thewindpower.net

2.9.1.4. Simulated locations and wind technologies

The simulated locations and wind technologies depend on the type of run. An overview of the runs is given in Table 2.6.

Table 2.6: Wind run types.

Run type	ERA5 simulated years	Climate projection simulated years	WPP locations	WPP technology	Losses
Validation (for validation only, not delivered)	2015 - 2022	Not simulated	Changed every year to match changing WPP installations (based on WindPowerNet data)	Existing WPP parameters based on WindPowerNet data (changed every year), applied in the generic power curve model	Wakes as part of the generic power curve. Other losses (incl. unavailability) are applied as a simple multiplication for onshore, but as a stochastic process for offshore wind.
Existing	1950 - near present	2015 - 2100	All years with 2020 WPP locations (based on WindPowerNet data)	Existing WPP parameters based on WindPowerNet data (always 2020 fleet), applied in the generic power curve model	Wakes as part of the generic power curve. Other losses (incl. unavailability) are applied as a simple multiplication for onshore, but as a stochastic process for offshore wind.
Future wind technologies	1950 - near present	2015 - 2100	The best 10-50 % locations (ReGrB) of the unmasked points within each PECD region (in terms of mean wind speed in the bias-adjusted ERA5 data, based on ERA5 grid). A separate run considering only the best 10 % locations (ReGrA) is also provided.	Onshore wind: 3 hub heights and 3 turbine types, so in total 9 wind technologies. A plant of 50 MW with ten 5 MW turbines modelled for each technology. Offshore wind: 1 hub height and 2 turbine types, so in total 2 wind technologies. A plant of 500 MW with 28 18 MW turbines modelled for each technology.	Wakes as part of the generic power curve. Other losses (incl. unavailability) are applied as a simple multiplication for onshore, but as a stochastic process for offshore wind.

Some notes on Table 2.6:

All wake modelling considers only intra-farm wake effects (no interaction between separate wind plants).
Literature suggests a range of 5 % to 10 % for the other losses (Mortensen, 2018). The existing installations cover historical installations over tens of years with older technology, whereas the future installations are new installations (no wear-and-tear considered) with modern technology; it was thus considered fair to place them at the opposite sides of the loss range (5 % for new technologies and on average 10 % for existing installations).
A mask is used to find the potential points for the Future wind technologies runs. This mask ("wind power regions mask") is available for download in the CDS. Please refer to Table 4.2 and Table 4.3 for more info.
Locations of existing wind power plants are not considered in the assessment of the 10-50 % best locations for each region. This is done because the decommissioning of old turbines is expected to free up more space for new installations in the future.
The assumed locations of wind power plant installations within a region significantly impact the expected capacity factor on the aggregate level (Swisher et al., 2022). PECDv4.1 accounted for a single ‘resource grade’ (ReGr), which considers the 10–50% best locations; therefore, no selection option appeared on the CDS download page. In contrast, PECDv4.2 offers the possibility to select between two separate simulations covering the 10% best locations (ReGrA) and the 10–50% best locations (ReGrB). In the future, simulations accounting also for the 50% worst locations—or, in principle, any other distribution split between 0 and 100%—could be provided in later versions of PECD in consultation with ENTSO-E. However, this would multiply the amount of future wind technology time series.

In addition to the plant-level power curves, information on the existing wind power installations is required to simulate generation from the existing fleet. Data from WindPowerNet are used, with the missing technical parameters (turbine type and hub height) estimated based on the machine learning approach from Koivisto et al. (2021). Wind power plants without location or installed capacity information are removed. An overview of the installed capacities (2020 fleet) and key WPP technical parameters for onshore and offshore installations is shown in Figures 2.18 to 2.21.

Figure 2.18: Onshore wind installation capacities in the different PECD regions in the Existing run (i.e., 2020 installations).

Figure 2.19: Onshore regional weighted mean hub heights (left) and specific powers (right) in the different PECD regions in the Existing run (i.e., 2020 installations).

Figure 2.20: Offshore wind installation capacities in the different PECD regions in the Existing run (i.e., 2020 installations).

Figure 2.21: Offshore regional weighted mean hub heights (left) and specific powers (right) in the different PECD regions in the Existing run (i.e., 2020 installations).

For future wind installations, the starting point is the ERA5 grid points. Based on the exclusion layers presented in Section 2.8 (specifically, the "wind power regions mask"; please refer to Table 4.2 and Table 4.3 for more info), masking is then applied to these points to select potential future WPP locations. The potential points are shown in Figure 2.22. After selecting the 10-50% best points (referred to as ReGrB, based on 100 m mean wind speeds), the resulting final future installation simulation points can be seen for onshore wind in Figure 2.23 and for offshore wind in Figure 2.24. The selection of 10-50% best points is the average ‘resource grade’ selection following from the work done in Swisher et al. (2022), whereas the best 10 % of locations (ReGrA) represents the best wind sites.

Figure 2.22: All potential future installation locations (light blue dots) are shown for onshore and offshore regions. Some points were removed due to the applied mask with exclusion layers (the mask is binary: a point either is considered or not). If a small region has no points (e.g., Malta), 1 simulation point is added manually.

Figure 2.23: Onshore locations for the future technology runs for the average resource grade (10-50 % best locations, ReGrB) (left) and the best 10 % of locations (ReGrA) (right). Colouring shows the mean wind speed (m/s) of each location. The locations of existing wind power plants and the repowering of existing plants are not considered.

Figure 2.24: Offshore locations for the future technology runs for the average resource grade (10-50 % best locations, ReGrB) (left) and the best 10 % of locations (ReGrA) (right). Colouring shows the mean wind speed (m/s) of each location. The locations of existing wind power plants and the repowering of existing plants are not considered.

2.9.1.5. Aggregation to the regional level

After simulating the hourly generation for each WPP, the results are aggregated at the regional level. For existing installations, regional aggregation is weighted by the installed capacity of each WPP. For future technologies, the same weight is used for each location. From a processing point of view, temporary NetCDF files are used, but the final regional results are saved as CSV files. In addition to power generation, similar weighted regional wind speed averages are saved.

2.9.1.6. Input data modifications based on measured data and TSO feedback

There is always uncertainty related to the technical wind power plant input data. While the data from www.thewindpower.net is extensive, most countries have significant missing data (most importantly hub height and turbine specific power), so it was considered reasonable to modify the inputs to some extent. E.g., for hub heights, around 60 % of the wind power plants had missing hub height data for Portugal, with 80 % missing for Spain and 70 % for Italy. For specific power, the respective missing data shares were 10 %, 20 %, and 20 %. The modifications are shown below. They were presented for and agreed by ENTSO-E and the respective TSOs. The modifications lead to better fit to measured data. They are done only for the Validation and Existing runs.

Adjustments for German onshore wind (existing installations):

The specific power (SP) of turbines was increased by 10 % relative to the reported values for the existing German fleet (based on Section 2.9.1.4 methodology)

Adjustments for Spanish onshore wind (existing installations):

The SP of turbines was reduced by 6 % relative to the reported values for the existing Spanish fleet (based on Section 2.9.1.4 methodology)
The hub heights (HH) were increased by 13 % relative to the reported values for the existing Spanish fleet (based on Section 2.9.1.4 methodology)

Adjustments for French onshore wind (existing installations):

The SP of turbines was increased by 2 % relative to the reported values for the existing French fleet (based on Section 2.9.1.4 methodology)
The HH were decreased by 5 % relative to the reported values for the existing French fleet (based on Section 2.9.1.4 methodology)

Adjustments for Italian onshore wind (existing installations):

The SP of turbines was reduced by 4 % relative to the reported values for the existing Italian fleet (based on Section 2.9.1.4 methodology)
The HH were increased by 9 % relative to the reported values for the existing Italian fleet (based on Section 2.9.1.4 methodology)

Adjustments for Portuguese onshore wind (existing installations):

The SP of turbines was reduced by 15 % relative to the reported values for the existing Portuguese fleet (based on Section 2.9.1.4 methodology)
The HH were increased by 30 % relative to the reported values for the existing Portuguese fleet (based on Section 2.9.1.4 methodology)

Other than wake losses are on average assumed to be 10 % for the existing installations, but as there is uncertainty, it was considered reasonable to vary them between 5 % and 15 % for different countries. The modified other losses for onshore wind runs are: 15 % for Germany, Finland, France, Greece, Italy and Norway; and 5 % for Denmark, Hungary, Portugal and Sweden. For offshore wind they are: 7 % for Belgium and 5 % for Denmark. Otherwise, the other losses are 10 %. These loss assumptions were presented for and agreed by ENTSO-E and the respective TSOs. They lead to better fit to measured data. They are done only for the Validation and Existing runs

The wind installation density (MW/km²) data were not found for the different European countries. Installation density is expected to vary based on land availability (Murcia et al., 2022), with countries with a lot of existing wind installations and relatively high population density (e.g., Germany) showing higher density of installations due to land availability constraints. The density is assumed to vary from 15 MW/km² in Germany (by far the most existing wind installations per km² in Europe), to 10 MW/km² in countries with significant wind installations and generally high population density (Belgium, France, Ireland, Luxemburg, Netherlands and the UK) and to 4 or 7 MW/km² for the other counties. The 4 MW/ km² installation density is assumed for counties with more limited wind installations or lower population density and thus significant available land (e.g., Bulgaria, Estonia and Finland), with 7 MW/km² assumed for the rest (e.g., Austria, Denmark and Italy). These installation densities are assumed for both existing and future installations.

2.9.2. Photovoltaic Solar Power Conversion Model

To estimate photovoltaic (PV) capacity factors at the regional level, a flexible yet robust modelling workflow has been implemented (Saint-Drenan et al., 2018). The method starts by modelling PV output at the individual system level, taking into account the specific location and the orientation and tilt of the module’s plane-of-array (POA). This system-level output is then scaled up to regional values by averaging over a representative set of possible POA configurations, weighted according to metadata from real-world PV installations.

2.9.2.1. Temporal downscaling

The PV modelling workflow applies a two-step temporal downscaling process to solar global horizontal irradiance (GHI) and 2-metre temperature (TA), ensuring compatibility with the high temporal resolution required for accurate PV capacity factor calculations.

For climate projections, the original resolution of GHI and TA is 3-hourly, which is coarser than required. To improve this, GHI is first converted into a clearness index (Kt), which removes the influence of solar geometry. This is done by dividing GHI by the corresponding top-of-atmosphere (TOA) irradiance. The resulting Kt series—now independent of sun position—is interpolated linearly to hourly resolution and then reconverted back to GHI by multiplying by the hourly TOA values. Air temperature is directly interpolated to hourly resolution using a cubic spline method.

A second downscaling step is applied to both historical and projected datasets: the hourly GHI and TA values are further interpolated to a 15-minute resolution. PV capacity factors are calculated at this finer time step and then averaged back to hourly values. This process allows for better representation of diurnal solar variability and reduces artefacts that might result from simpler interpolation methods.

2.9.2.2. Inferring plane-of-array irradiance: decomposition and transposition

To estimate the irradiance on a photovoltaic module’s POA, it is necessary to convert GHI, which is measured on a horizontal surface, into irradiance on a tilted surface. This requires separating GHI into its two components—direct and diffuse solar radiation—using a decomposition model.

The diffuse horizontal irradiance (DHI) is first estimated using the Skartveit-Olseth model (Skartveit et al., 1987), which estimates the diffuse fraction (i.e., the ratio between the diffuse and global radiation). This model accounts for three distinct sky conditions—overcast, partially cloudy, and clear—and uses the clearness index (Kt) and solar elevation angle to characterise atmospheric transmissivity and air mass, respectively.

Once the diffuse and direct components are determined, each is transposed to the tilted surface separately. The ground-reflected component is also included. The total global tilted irradiance (GTI) is computed as:

\[ GTI = R_b \times BHI + R_d \times DHI + R_r \times GHI \]

where BHI and DHI are the beam and diffuse horizontal irradiances, and R_b, R_d, and R_r are the transposition functions for beam, diffuse, and reflected irradiance, respectively.

The isotropic direct irradiance is transposed using a simple trigonometric relation:

\[ R_b = \frac{\cos(AOI)}{\cos(SZA)} \]

with AOI corresponding to the angle of incidence and SZA to the solar zenith angle. Both were calculated using the SG2 algorithm (Blanc et al., 2012).

Diffuse irradiance is transposed using the Klucher model (Klucher, 1979), specifically its implementation in the pvlib Python package. This anisotropic approach considers both the sky brightening near the sun (i.e., circumsolar irradiance) as well as the increase in irradiance near the horizon by hybridizing the Liu-Jordan (Liu & Jordan, 1961) and Temps-Coulson (Temps & Coulson, 1977) models, since the first is more suitable for overcast conditions and the second for partially cloudy and clear-sky:

\[ R_d = \Bigl(\frac{1+\cos\beta}{2}\Bigr) \times (1+f_k \times \cos^2(AOI) \times \cos^3(SZA)) \times \Bigl(1+f_k \times \sin^3\frac{\beta}{2}\Bigr) \]

where β is the surface tilt angle and the anisotropy index f_k corresponds to:

\[ f_k = 1- \frac{DHI}{GHI} \]

Lastly, reflected irradiance (R_r) is assumed to be isotropic and is modelled with a constant ground albedo of 0.2 (ρ), which is a commonly used value in the literature (Gueymard et al., 2019):

\[ R_r = \rho \times \Bigl(1-\cos\frac{\beta}{2}\Bigr) \]

This decomposition and transposition process ensures a more realistic estimation of solar radiation on tilted surfaces, which is essential for modelling PV system performance.

2.9.2.3. PV modelling: optical losses, conversion efficiency, temperature and inverter losses

Before modelling the PV energy conversion process, optical losses due to reflection on the module’s surface must be accounted for. These are modelled using the Martin-Ruiz model (Martin & Ruiz, 2001, 2013), which relates surface reflectance to the angle of incidence and the properties of the PV module’s glazing.

The following expressions represent the reflection losses for each irradiance component:

\[ F_{\text{beam}} = \frac{\exp\left(-\cos \alpha / a_r\right) - \exp\left(-1 / a_r\right)}{1 - \exp\left(-1 / a_r\right)} \] \[ F_{\text{dif}} = \exp\left(-\frac{1}{a_r} \left( c_1 \frac{\sin \beta + (\pi - \beta - \sin \beta)}{1 + \cos \beta} + c_2 \left( \frac{\sin \beta + (\pi - \beta - \sin \beta)}{1 + \cos \beta} \right)^2 \right)\right) \] \[ F_{\text{ref}} = \exp\left(-\frac{1}{a_r} \left( c_1 \frac{\sin \beta + (\beta - \sin \beta)}{1 - \cos \beta} + c_2 \left( \frac{\sin \beta + (\beta - \sin \beta)}{1 - \cos \beta} \right)^2 \right)\right) \]

where \( a_r = 0.17 \) is the angular loss coefficient, \( c_1 = 4/3\pi \) and \( c_2 = -0.069 \) are empirical parameters for monocrystalline silicon modules, \( \alpha \) is the angle of incidence and \( \beta \) is the tilt angle.

These reflection factors are used to compute the effective global tilted irradiance GTI_eff:

\[ GTI_{\text{eff}} = BTI \times (1 - F_{\text{beam}}) + DTI \times (1 - F_{\text{dif}}) + RTI \times (1 - F_{\text{ref}}) \]

Knowing the effective incoming irradiance, PV module efficiency is calculated as proposed in (Beyer et al., 2004). Efficiency is first inferred for a 25°C module temperature, using an expanded version of the model proposed by (Williams et al., 2003), fitting the following parametric equation against actual PV data:

\[ CF_{\text{PV}, 25^\circ\text{C}} = a_1 \times GTI + a_2 \times GTI^2 + a_3 \times GTI^3 + a_4 \times GTI^4 + a_5 \times GTI \times \log(GTI) \]

The coefficients a_i were derived from real PV generation data in Germany and France:

a₁ = 1.4306
a₂ = -1.0084
a₃ = 1.0121
a₄ = -0.4401
a₅ = 0.1979

This formulation captures the reduced efficiency of PV modules under low irradiance conditions. Note: from this point onward, GTI_eff is expressed in kW.m^-2.

Next, module temperature T_PV is estimated using the approach proposed by Ross (1976), which models temperature as a function of ambient air temperature and effective irradiance:

\[ T_{PV} = T_a + k \times GTI_{\text{eff}} \]

Thermal losses are then applied using a linear temperature coefficient of −0.45%/°C:

\[ CF_{PV}^* = CF_{PV,25^\circ\text{C}} \times \left( 1 - 0.0045 \times (T_{PV} - 25) \right) \]

Finally, inverter losses are modelled using the approach from Schmidt et al. (1994), which accounts for inverter self-consumption and electrical losses (voltage and ohmic). Even though inverter losses are considered, the capacity factor time series is normalised against the module’s peak (DC) capacity:

\[ CF_{PV} = CF_{PV}^* \times \left( b_1 + b_2 \times CF_{PV}^* + b_3 \times {CF_{PV}^*}^2 \right) \]

This model does not include the effects of module degradation, inverter clipping, or variability in module characteristics (e.g., efficiency, temperature coefficient). However, these simplifications are acceptable given the lack of detailed data on the PV systems installed across the PECD region. Moreover, uncertainties introduced at the plant level are significantly mitigated by the spatial averaging applied during regional aggregation.

2.9.2.4. Upscaling to Regional PV Aggregation

2.9.2.4.1. Modelling PV Typologies

Photovoltaics is a flexible technology that can be installed in many different contexts, which then define the characteristics of a PV installation. Compared to PECDv4.1, which did not consider different PV technologies, PECDv4.2 comprises four different implementations, here designated as typologies. Sharing the same physical modelling framework, each typology is characterised by specific module tilt, azimuth, and thermal losses.

2.9.2.4.2. Residential rooftops

By exploring various PV databases, it was possible to infer a first correlation between the latitude of a given location and the mean tilt of installations smaller than 9 kW (here assumed as a proxy for residential PV), which was parametrised as a set of linear functions (Figure 2.25). As it was not possible to collect data for latitudes below 42º and above 54º, it was assumed that the minimum and maximum tilt defined for the covered latitudes stabilise outside of the available range.

Figure 2.25: Module tilt angle for PV installations below 9 kW as a function of latitude.
The red line shows the median tilt angle derived from the dataset, while the black line indicates the linear approximations adopted in PECDv4.2.

Additionally, the modules are assumed to be South-oriented. While a future version of the PECD may account for a more detailed azimuth description, the data shows that it tends to be distributed around the South in a near-symmetrical manner with a moderate spread. This indicates that a small impact should be expected from this simplification.

Because it is quite common for rooftop PV to be installed immediately over the rooftop and sharing its tilt, minimising their aesthetic impact and infrastructure costs, they suffer from a lack of convective cooling on their back side. This is parametrised as a Ross coefficient of 0.34 when calculating PV module temperature (Skoplaki et al., 2009), which for a sunny, hot day – ambient temperature of 30ºC and an incident solar radiation of 1000 W.m^-2 – leads to 1.8% higher thermal losses than in PECDv4.1.

2.9.2.4.3. Industrial Rooftops

Unlike residential rooftops, industrial rooftops are often flat, providing greater flexibility in engineering design. However, their typically high electricity demand favours installations with very low tilt angles, since it reduces the shading between rows of modules and enables a higher energy density (kWh/m²) at the expense of a lower capacity factor (kWh/kWp).

An exploratory analysis of collected empirical data, along with discussions with PV companies, suggests that these installations are typically mounted with a 10° tilt and exhibit a more varied azimuth than residential rooftops. Consequently, this typology assumes that 50% of installations are oriented southward, while 25% face east and 25% west. These installations are assumed to have lower convective cooling and, thus, the same thermal modelling as for residential rooftop installations.

2.9.2.4.4. Utility-scale Fixed

Module tilt and orientation data for utility-scale PV were inferred from the metadata of hundreds of installations located in France and Germany, for which we can access quality data. Although the PECD spatial coverage is considerably larger, this kind of information is deemed highly sensitive by the industry, making it uncommon to find as open data.

To circumvent the already mentioned limited geographical coverage of the collected data, the tilt was normalised by its theoretical optimum tilt (which maximises the incident irradiation), so that the resulting distribution is more generalizable in space. To calculate the optimum tilt for the PECD region, the PV generation is estimated for different tilts, always south-oriented, using ERA5 data between 2015-2020; the tilt resulting in the highest overall irradiation is selected (Figure 2.26).

Figure 2.26: Optimal tilt angle, which maximises annual yield, over the PECD domain calculated considering a 5-year period of ERA5 data.

Based on Figure 2.27, which compares the optimal tilt estimated from ERA5 data with the tilt from actual installations, the utility-scale fixed PV in PECDv4.2 assumes, in each grid cell, a tilt equal to 75% of its theoretical optimum. This discrepancy is most likely due to a common engineering practice: by reducing the shadowing between rows of modules, a higher energy density – i.e., a higher kWh generated per unit of area – and lower land requirements can be achieved. The tilt ratio and orientation are visualised as a 2D histogram, which shows that both parameters can be well described by two normal distributions (Figure 2.28).

Figure 2.27: Relationship between module effective and optimal tilt angles. It is possible to see that, for the considered sample of hundreds of installations in Germany and France, a considerable amount of data shows a lower tilt than what would be deemed optimal from an annual yield perspective.

Figure 2.28: Empirical distribution of the PV installations' tilt and azimuth angles, with the first being relative to the local optimal tilt angle, as well as inferred normal distributions.

2.9.2.4.5. Utility-scale Tracking

Among the various tracking configurations, PECDv4.2 includes only horizontal single-axis tracking (HSAT), as it is currently the most widely adopted. In this setup, PV modules are mounted horizontally on a single axis aligned North-South, rotating daily from East to West as the day progresses. This configuration increases capacity factors, particularly in summer and when the Sun is higher in the sky. The tracking modelling, implemented using the pvlib Python library, also accounts for back-tracking, a widely used strategy that adjusts module positioning when the Sun is low to minimise shading between rows, even if it deviates from the optimal angle for energy capture. This is done considering a ratio between the PV array area and the corresponding land use, or equivalently, the inverse of the axis spacing, of 0.35.

While the movement of the modules may impact their thermal performance throughout the day, for PECDv4.2 this typology was assumed to share the same assumptions as the Utility-scale fixed case.

2.9.2.4.6. Application of Exclusion Areas and Spatial Aggregation

Once the PV capacity factor product is generated for the PECD-constrained ERA5 grid, regional estimates for bidding and study zones are calculated by means of a spatial average. However, it is important to note that particular (restricted) areas were masked in both the grid-like and regional-based products to produce more accurate results. Specifically, sea and ocean areas (thus, offshore PV), polar and protected areas, as well as locations with high elevation (above 2000 m a.s.l.) or slope (higher than 10%) were excluded from the computation (please refer to Table 4.2 and Table 4.3 for more details). While high elevation may be unsuitable as an exclusion criterion at a global scale (notably for Chile), we found that for the PECD area, this does not pose issues in terms of final PV estimates. Figure 2.29 shows the composite exclusion mask considered for the computation of the solar photovoltaic technologies. The information to identify such regions was obtained from a range of sources: the ERA5 Land-sea mask, the Copernicus Land cover classification gridded maps, the World Database on Protected Areas (WDPA) and Other Effective Area-based Conservation Measures (OECM), and the ETOPO1 bathymetric and topographic digital elevation model.

Figure 2.29: Composite exclusion mask considered for the solar photovoltaic technologies.

2.9.2.5. Making use of typology-level data

ENTSO-E’s adequacy studies are based on the integration of the capacity factor data from the PECD with the structural data of the European power system – such as installed capacities by technology – submitted by TSOs to the Pan-European Market Modelling Database (PEMMDB). Thus, to align with the increased granularity of PECD version 4.2, TSOs began, in 2024, reporting installed PV capacity per typology. While the first data collection indicated a generally positive adoption of the new framework, it may still present challenges for TSOs, as it requires more detailed technological roadmaps. At the same time, it offers an opportunity to not only simulate pre-defined energy systems, but also to test and compare alternative technological scenarios. From a broader perspective on the potential end users of this data, complementing these typology-level timeseries with cost assumptions could support more detailed and realistic energy optimization studies.

Model validation, in turn, is also complex: TSOs do not necessarily possess typology-level installed capacity or generation data; and even if PV is only considered as a whole (i.e., using the total PV generation and installed capacity from a given territory as reference), it would require an assumption regarding the relative capacity share of the different typologies as to proceed with a weighted average of the PECDv4.2 capacity factor timeseries. However, national PV records often lack the detail needed to disaggregate generation or installed capacity by typology. Some TSOs provide data differentiated by the power grid level at which PV plants are connected. In a few cases, both capacity and generation are further disaggregated by installation size (e.g., up to 9 kW, 9–25 kW, etc.). Even more rarely, data are broken down by the actual PV typology (but not necessarily matching the configurations defined in PECDv4.2). In many instances, classification relies exclusively on installation size. For example, the classification of photovoltaic (PV) system typologies defined by the U.S. Solar Photovoltaic System and Energy Storage Cost Benchmark (Q1 2020, NREL) is the following:

Residential: residential rooftop systems using monocrystalline silicon modules, with system size typically ranging from 4 kW to 7 kW.
Commercial: commercial rooftop installations with ballasted racking and fixed-tilt ground-mounted systems, also using monocrystalline silicon modules. System size ranges from 100 kW to 2 MW.
Utility-scale: ground-mounted systems using monocrystalline silicon modules, featuring fixed-tilt or one-axis tracking configurations. These systems typically range from 5 MW to 100 MW in capacity.

While this method is widely applicable and straightforward to implement where descriptive data are limited, it is also rigid and prone to misclassification. For instance, small ground-mounted systems may be grouped with industrial rooftop installations, and utility-scale systems with and without tracking may be classified together, despite notable operational differences.

This challenge has been clearly identified with the release of PECDv4.2 and has been under investigation. Future efforts will focus on a more extensive data collection and validation, both at the typology- and aggregated-level.

2.9.2.6. Improvements over Previous Methodology

PECDv4.2 introduces a significant shift in the modelling of regional-level PV timeseries. While previous versions relied on a single model, the current version acknowledges the diversity of PV implementations. It establishes a base modelling framework that incorporates specific parameterisations tailored to different installation types, ensuring a more accurate representation of PV technology variations.

In particular, it accounts for variations in tilt and azimuth angles, which affect both the daily and seasonal generation profile, as well as optical losses from reflection. Additionally, it considers ventilation conditions, which influence module temperature and, consequently, thermal losses.

2.9.3. Concentrated Solar Power Conversion Model

As specified in the work plan, the concentrated solar power (CSP) model developed by DTU and used in the previous version of PECD is also employed in this release. A brief description of the model is provided below.

The CSP model consists of three main components: a solar field, a power block, and a thermal energy storage system. Key parameters include the solar multiple (SM)—defined as the ratio between the solar field capacity and the turbine capacity—the installed capacity, the efficiency of the thermal block, and the size of the thermal storage. Storage capacity is expressed in hours of rated power operation. The heat transfer fluid is modelled as a first-order dynamic system with a characteristic time constant, introducing a delay between changes in DNI (direct normal irradiance) and power output, even in the absence of storage.

The operational strategy of the modelled CSP system follows two main rules:

If the solar field generates more energy than required to operate at rated power, the surplus is stored.
If the solar field generates less, the storage discharges energy to maintain rated power (see Figure 2.30).

This strategy does not require knowledge of market prices. The relationship between the solar multiple and the thermal energy storage size remains consistent with the previous PECD version (see Table 2.7). The model has been recalibrated using updated climate data.

For the simulations, the top 50% of locations within each PECD region, ranked by average DNI, are selected as candidate CSP installation sites. Two configurations are modelled:

CSP plants without thermal storage
CSP plants with 7 hours of thermal energy storage

In the "Technological specification" section of the download form, each CSP configuration is represented by a code that indicates whether the plant includes energy storage and whether the energy output refers to pre- or post-dispatch values. The options are:

40 (Pre-dispatch, no storage): potential generation before dispatch, without storage
41 (Dispatched, no storage): dispatched energy from a plant without storage
42 (Pre-dispatch, 7 hours of storage): potential generation before dispatch, with 7 hours of storage
43 (Dispatched, 7 hours of storage): dispatched energy from a plant with 7 hours of storage

Figure 2.30: Overview of CSP behaviour when thermal storage is available. The power from the solar field (dashed green line) is higher than the installed capacity (1.0) and is thus stored. The orange line shows dispatch from the CSP plant.

Table 2.7: Solar multiple (SM) as a function of thermal energy storage (TES).

TES (hours)	SM
0	1.5
3	1.75
6	2.0
9	2.5
12	2.9
18	3.0

2.9.4. Hydro Power Conversion Model

For the historical stream, the goal for the Hydropower (HP) model is to reproduce the hydropower energy indicators starting from climate data, reconstructing their time series for the historical period (1979-2022).

The target spatial resolution, originally set at the country level, has been refined to the Bidding Zone level (SZON), providing more detailed data for Italy, Norway, and Sweden. Specifically, the southern Norwegian region (NOS0) has been divided into three separate PECD regions (NOS1, NOS2, NOS3) as requested by ENTSO-E. The target temporal resolution is weekly.

The starting point of the work is the publicly available generation data (in MW) that can be accessed through the ENTSO-E Transparency Platform (TP) with which the model has been trained and validated to produce the results up to December 2021. The data include hydropower generation timeseries (at a resolution of 15 min, 30 min, or 1 hour depending on the country), Installed Capacity time series (annual), and Stored Energy (SE) time series to reservoirs (also referred to as ‘Filling Rates’) and pumped storage (at weekly resolution). Since these data are not sufficient to yield a complete dataset for simulations, two additional sources have been employed: (1) data provided directly by TSOs and (2) inflow data from the previous PECDv3.1 (see Table 2.11 for more details). The three sources were ranked following data reliability in accordance with ENTSO-E: in particular, TSOs' data are accounted for as the most reliable and are ranked with the highest priority. This data includes generation and pumping timeseries at hourly resolution and NUT0 or PECD granularity. Some TSOs provided timeseries of stored energy for their countries at weekly resolution for reservoir and open-loop pumped storage technologies, which were used to estimate inflows for such technologies (see countries citing ‘TSO’ as a source under inflow columns HRI and HOL, Table 2.11). Additionally, some countries provided monthly timeseries of Installed Capacity (IC), which were useful to account for significant changes in generation due to new installations throughout the historical time series (this information was used for countries citing ‘rescaled using monthly IC’, Table 2.11).

Where TSO timeseries are not complete, TP data are used, with some exceptions (see section Estimating Inflows). Finally, PECDv3.1 data have been employed where TSO and TP data are not sufficient. Especially, they help in completing the open-loop pumped storage inflow data, since only a few TSOs are able to share stored energy timeseries for this technology.

The climate historical data are taken from the ERA5 Reanalysis model for the validation runs and the reconstruction of historical time series. The data is hence aggregated at NUT0 level and at PEON level, to address the target granularity.

Given the size of the domain and the absence of publicly available data for the individual HP plants (such as plant heads, installed technology and artificial regulation), the idea of employing a hydrological model was discarded. Instead, a statistical approach was adopted for its adaptability to the available inputs and the reduced computational costs, accounting for the climate impact by means of temperature and precipitation input. The application of a hydrological model, without accounting for artificial regulation, would imply the misrepresentation of climatic forcing on HP generation correlated with energy demand.

The following sections describe the statistical model, the pre-processing of input data, the validation procedure, and the use of the model to reconstruct historical data and estimate future projections. Finally, the last section describes the adopted methodology to estimate the inflows starting from the available data.

2.9.4.1. The Statistical Model

The statistical model here adopted is the Random Forest Regression model (Pedregosa et al., 2011; hereafter, the RF model), a machine learning model based on ensemble learning, which already proved to work well at such a resolution and broad domain in a previous study by Ho et al. (2020). In a preliminary comparison, at the first stages of the project, the model also proved a comparable performance over France for both HRE (Reservoirs) and HRO (Run-of-river) technologies with respect to a Neural Network fed by discharge data (a model employed in the current PECD).

The Random Forest takes as input the generation (or inflow) data, namely the target variable, and some climate datasets covering the same time period, the predictors, and trains a large number of decision trees to predict the target variable starting from the predictors. In the end, it averages the answers from all the trees to obtain the model prediction. The number of trees in the ‘forest’, and their characteristics, can be adjusted by tuning several parameters.

2.9.4.2. Energy data pre-processing

Regarding the pre-processing of energy data, the hydropower generation, Installed Capacity and Stored Energy time series are extracted from a larger database for each PECD country and re-organized in multiple CSV files. Similarly, also TSO and PECDv3.1 data are organized into analogous CSV files. Where needed, the generation data is resampled to 1h. A weekly aggregation follows and consists of a sum of the hourly values for those weeks where at least 80% of data are available. If this holds true, the gaps in hourly values are filled by a simple interpolation. If the week presents >20% of missing values, the whole week is set to NaN. Specific checks are also made for the first values of the timeseries, as they are often unphysical, in which case they are adjusted based on adjacent values or set to NaN.

Stored Energy data are available at weekly resolution. Here too, the first values of the time series are sometimes unphysical and are therefore manually corrected. The presence of other unphysical jumps in the signal is also checked: in this case, the values are first set to NaN and then interpolated to fill the gaps, if these are small.

Finally, while the generation can be directly employed as a predictor of the RF model, the inflows must first be estimated starting from the available data (see section Estimating Inflows) and then modelled.

2.9.4.3. Climate data preprocessing

For the purposes of this application, the most informative variables that can be found in all climate datasets are 2-m temperature (TA [K]) and total precipitation (TP [m])¹⁴ , which are commonly fed to hydrological models to compute river discharge. In particular, the two variables are useful if averaged (for TA) and cumulated (for TP) over multiple weeks preceding the time of the estimation of the generation or inflow. It is important, for instance, to consider the time lag between a precipitation event over a given area, and the corresponding discharge water reaching the hydropower plants downstream. Therefore, precipitation is cumulated over up to 30 weeks, while temperature is averaged over up to 15 weeks. According to the example of Table 2.8, if the model is used to estimate the HP generation produced for the week of 2015-01-05, it will take as predictors the TA and TP for that same week, as well as the average TA of the previous 2, 3, 4, …, 15 weeks, and the cumulated over the previous 2, 3, 4, …, 30 weeks.

Table 2.8: Schematic of the predictors (columns) used as input by the Random Forest model for the simulation of hydropower generation or inflow for the first weeks of January 2015 (generic dates). TA and TP stand respectively for 2-m temperature and total precipitation, while W followed by a number indicates the number of past weeks over which the variable has been averaged (for TA[K]) or aggregated (for TP[m]).

Date	TA_W1	TP_W1	TA_W2	TP_W2	TA_W3	TP_W3	…
2015-01-05	276	0.007	276	0.025	276	0.027	…
2015-01-12	278	0.009	277	0.016	276.7	0.034	…
…	…	…	…	…	…	…

The climate data used for both the training/validation, over the period when observations are available, and the subsequent reconstruction of the historical time series, extending it to 1950, comes from the ERA5 Reanalysis model.

The datasets are aggregated at weekly resolution (summing precipitation and averaging temperature) and then the lags up to 30 weeks are calculated, meaning that values are cumulated (summed/averaged) over multiple weeks to yield several more datasets, which will be used as predictors for the RF model. At the end of this pre-processing step, one CSV file per country and climate dataset is produced.

Also snow depth can be informative for mountainous regions but was demonstrated to add little to no value (see Lin Ho et al., 2020) to the main predictors: temperature and precipitation; furthermore, not all climate projection datasets include snow depth among the modelled variables. The latter is true also for river discharge, which otherwise could also have been employed as a predictor.

2.9.4.4. Model validation: Leave-One-Year-Out Validation

The model is validated separately for each SZON region and indicator, over the period of energy data availability (within 2015-2022 in case of TP data, 2010-2022 in case of TSO data, 2010-2017 in case of PECDv3.1 data). The validation procedure followed is the Leave-One-Year-Out (LOYO), which trains the model over all N available years except one (test year), and evaluate the model performance over this test year. This is repeated N times, keeping one year as the test year, until the complete estimated time series can be assembled (see Figure 2.31).

Figure 2.31: Example of inflow to reservoirs time series estimated (or predicted) through LOYO procedure with a random forest regression model (red), against observation (grey).

Once the estimated time series is available, several metrics are calculated to quantify the goodness of the fit to observations. Among these, the Nash-Sutcliffe Efficiency (NSE) metric, widely used in hydrology, is adopted as one of the main reference metrics:

\[ NSE = 1 - \dfrac{\sum_{i=0}^{n}(x^{i}_{m} - x^{i}_{o})^{2}}{\sum_{i=0}^{n}(x^{i}_{o} - \overline{x}_{o})^{2}} \]

where \( x^{i}_{m} \) is the modelled value at timestep i, \( x^{i}_{o} \) the observed value at timestep, \( \overline{x}_{o} \) the mean of the observed values, and n the total number of timesteps.

For instance, in the case of the modelled time series in Figure 2.31, the NSE value is 0.59 (as also reported in the upper left corner of the figure). The metric is calculated as one minus the ratio between the variance of the modelled timeseries and the variance of the observed timeseries. If there is no difference between the modelled (m) values and the observed (o) ones at each timestep (i), then the NSE will be 1 (perfect fit), which is the maximum value that can be reached. On the other hand, if there are significant differences between the two timeseries, the NSE can reach negative values (up to -Inf). An NSE = 0 would indicate that the model has the same predictive skill as the mean of the timeseries in terms of the sum of the squared error.

2.9.4.5. RF Model Parameters

As mentioned, the Random Forest can be built by specifying several parameters. The main parameters indicated in Table 2.9 have been tuned country by country and indicator by indicator. This has been done by sampling a hyperparameter space with the Latin Hypercube Sampling algorithm to find the set able to optimize a selected metric. The hyperparameter space has been defined by assigning a range of values to each of the main RF parameters. To efficiently sample this multidimensional domain, a Latin Hypercube Sampling of 1000 samples has been performed and each sampled set of parameters has been tested via LOYO procedure to yield the score of the chosen metric. Finally, the set of parameters yielding the best score was retained and used for that specific country and indicator.

Table 2.9: Random Forest (RF) parameters involved in the optimization procedure, with a short description and range of possible values sampled by the Latin Hypercube Sampling algorithm.

RF parameter	Short description	Range
n_estimators	number of trees in the forest.	100-500
max_features	maximum number of features (predictors) considered for splitting a tree node.	0.1-1 (1 meaning all available features)
max_depth	maximum number of levels in each decision tree.	1-100
min_samples_split	minimum number of data points placed in a node before the node is split.	2-30
min_samples_leaf	minimum number of data points allowed in a leaf node (terminal node of a tree).	2-30
bootstrap	method for sampling data points (with or without replacement).	True/False

The parameters optimization has been tested with two different metrics: the Nash-Sutcliffe Efficiency (NSE), and a combined metric. In particular, the latter includes the normalized NSE (NNSE), which indicates a general goodness of fit to the observations, and the Normalized Mean Absolute Error (MAE) of Annual Maxima, which quantifies the ability of the model to reproduce high extremes of generation¹⁵ . However, this metric requires longer computational times and, in a few cases, brings unphysical results. Therefore, the proposed results are obtained with RF parameters optimized using NSE.

The NSE can be normalized so that its values will span from 0 (instead of -Inf) to 1:
\( NNSE = \dfrac{1}{2 - NSE} \)
The annual maxima (AM) of generation for both estimates and observations are extracted and compared. The goal is to minimize the error between the two, hence we can minimize the Normalized Mean Absolute Error (NMAE), which spans from 0 to 1, or else maximize: 1 - NMAE(AM). The product between the two will always result in a value comprised between 0 and 1, with 1 being the perfect score.

2.9.4.6. Model Validation Results

To summarize the validation results, a map displaying the NSE scores obtained for each country is visible in Figure 2.32 for the generation and inflow to reservoirs, the inflows to run-of-river, the inflows to pondage, and the inflows to open loops. Generally, over the PECD domain, the results are satisfactory, with fairly high NSE values for most countries. This is especially seen for the inflow to reservoirs indicator (panel b), which assimilates information on the reservoirs filling rates (for the countries that provide it) and hence is able to reduce the human influence on the generation signal, while generation signal without this information can be harder to reproduce with a model based on temperature and precipitation alone (see panel a). High scores are obtained also for inflows to run-of-river and pondage (panels c and d), where the signal has a more distinct seasonality and is less influenced by human intervention. The scores are generally lower for inflows to open-loop (panel e), largely based on PECDv3.1 data.

Low scores are mainly due to few years of available data for the training (e.g. 3 years), or to irregularities in the time series of generation which reduce its seasonality. This can be caused by artificial regulations or faulty data records. In some cases, low scores were obtained due to a loss of seasonality in the time series brought by significant changes in the Installed Capacity for that country / bidding zone. The new installations can cause abrupt changes or gradual shifts in the mean observed signal. Since the model is based solely on climate data, it cannot predict this behaviour. A possible solution that’s been attempted is to model directly Capacity Factors (CF), hence normalizing the provided generation data by the annual series of country-aggregated Installed Capacities (IC). This improves the results for some countries, but generally worsens them for countries where the IC doesn’t change significantly with time. This means that generation may not reflect the actual IC at one time. Changes to the IC can occur at the beginning of the reporting year or at any time during the year, therefore likely introducing step changes in the IC. However, a data collection was launched by ENTSO-E to retrieve monthly Installed Capacity time series from the TSOs and some were able to provide them. Therefore, where new installations visibly affected the TSO generation time series, these were normalized with the corresponding monthly IC data provided, the model was trained on the normalized time series, and the output was then multiplied back by the same IC series to re-obtain a generation/inflow time series. This procedure was applied to timeseries of Albania, Switzerland, Hungary, Poland, and Portugal and must be taken into account when comparing projection energy values to historical ones, since in these cases the anomalies are not only due to changes in climate variables, but also to the known changes in IC. It is also important to note that the assumption made for this procedure is that TSO generation and TSO monthly installed capacity series provided for these countries were compatible. Therefore, any inconsistency that may be found between model outputs and expected historical values may come from discrepancies between generation and installed capacity initial input data.

Other time series displayed irregularities arguably attributable to changes in IC but were not provided with monthly IC series. In such cases, the RF model was trained on a recent restricted time window (at least 4 years) of close-to-constant IC. The latter is hence assumed unvaried in time.

Figure 2.32: maps of the LOYO validation results obtained in terms of NSE over the period of available data which depends on the source (TSO: 2010-2022, TP: 2015-2022, PECDv3.1: 2010-2017). The four panels each refer to a different inflow (or generation) indicator, as reported in the panels’ titles.

2.9.4.7. Modelling Historical stream

Once the model is validated, it is trained (again for each country and indicator) on all available years of generation data using the tailored sets of parameters found during the optimization procedure. The same parameters are then used to extend the HP indicator back to 1950, to have long reconstructed time series, using the ERA5 temperature and precipitation data. Figure 2.33 shows an example of a historical time series of inflow to reservoirs as estimated by the RF model for France (in blue). It also shows the ‘observed’ inflow series in grey, estimated with TP data (see section Estimating Inflows).

Figure 2.33: RF-reconstructed time series of inflow to reservoirs (HRI) for France (FR). The estimated series is shown in blue, while the observations (2015-2022) are in grey, starting from the dashed line.

2.9.4.8. Estimating Inflows

The RF model produces generation timeseries, although artificial regulations can significantly impact the timeseries and affect its seasonality, jeopardizing the capability of Temperature and Precipitation to reliably reproduce said signal. This issue regards specific technologies involving a reservoir, especially Reservoir and Open-Loop pumped storage systems, while the effect can be in general neglected for run-of-river plants and pondage plants, which are run-of-river plants making use of a limited storage capacity amounting to no more than 24 hours.

In general, the hydrological balance of a reservoir or pumped storage facility over a given amount of time can be written as:

\[ \Delta S = IN - OUT \]

where \( \Delta S \) is the stored water volume of the reservoir, \( IN \) being the inflow to the reservoir, and \( OUT \) being the outflow over time \( \Delta t \) The terms of the hydrological balance represent the net water flux and storage of our reservoir/pumping system. To express an energy balance, the water terms must be multiplied by the head of the pumps and of the hydropower plant, according to the equation expressing the relation between water discharge and power [MW]:

\[ P = \eta \cdot \gamma \cdot Q \cdot \Delta H \]

where \( \eta \) is the efficiency of the HP (conversely, the inverse efficiency of the pump for a pumped storage facility), \( \gamma \) is the specific weight of water [N/ m³], Q [m³/s] the volumetric discharge and \( \Delta H \) [m] the head jump. On the other hand, the energy term is expressed as the power spent or generated during the time period \( \Delta t \) :

\[ E = P \cdot \Delta t \]

The hydrological balance can then be expressed in terms of energy balance and used to compute the inflow and the other flux terms as energetic quantities. The inflow can be partitioned into a Natural component ( \( IN_{nat} \) ) and the energy pumped from the lower reservoir (in the case of a pumped storage system), which can be described as the consumed energy for pumping ( \( E_{pump} \) ) times an efficiency term ( \( \eta_{p} \) ). The outflow can be expressed as the actual production ( \( E_{out} \) ) divided by an efficiency term for production ( \( \eta_{o} \) ).

\[ \Delta S = IN_{nat} + \eta_{p} \cdot E_{pump} - \dfrac{E_{out}}{\eta_{o}} \]

When Stored Energy (S), actual generation output ( \( E_{out} \) ) dispatched to the grid and (if needed) consumed grid energy for pumping ( \( E_{pump} \) ) time series are available, the potential natural inflow to the system can be estimated assuming an efficiency for pumping and production.

An estimate of the efficiency can be found considering a closed-loop pumping system, which is not continuously connected to a river system: its natural inflow component can be considered null. In addition, for sufficiently large time intervals \( \Delta t \) , we can assume the storage term to be negligible compared to the other terms and hence write:

\[ \eta_{p} \cdot E_{pump} = \dfrac{E_{out}}{\eta_{o}} \]

This allows to estimate the round-trip efficiency of the system:

\[ \eta = \eta_{p} \cdot \eta_{o} = \dfrac{E_{out}}{E_{pump}} \]

Figure 2.34: An approximated sketch of a closed-loop system.

Figure 2.35: Cumulated energy production (blue), pumping energy consumption (red), and estimated natural inflow (green) for a French Closed-Loop unit.

This roundtrip efficiency usually depends on the design of the plant. For older designs it may be lower than 60%, while for recent ones it can be up to 90%. The suggested efficiency from ENTSO-E is 0.75, so we’ll assume this to be the reference value over Europe. As seen in Figure 2.35 for a French Closed-Loop unit, the balance holds as the production and pumping terms are cumulated over time and the natural inflow remains null.

2.9.4.9. Inflow to Open-loop Pumping

The situation for open-loop facilities, which is sketched on Figure 2.36, is different since the natural inflow component isn’t null, and therefore constitutes a third unknown, together with the two efficiencies. The assumption that one can make is to consider the pumping and production efficiencies as equal ( \( \eta_{o} = \eta_{p} \) ) and comparable to those of closed-loop plants ( \( \eta_{p} \cdot \eta_{p} = 0.75 \) ), and hence have:

\[ \eta_{p} \simeq 0.866 \] \[ IN_{nat} = \Delta S + \dfrac{E_{out}}{\eta_{p}} - \eta_{p} \cdot E_{pump} \]

Figure 2.36: An approximated sketch of an Open-loop system.

2.9.4.10. Inflow to Reservoirs

As for reservoirs, the pumping component is null, so the equation reduces to:

\[ IN_{nat} = \Delta S + \dfrac{E_{out}}{\eta_{p}} \]

In principle, there can also be another inflow component, which is released at the outlet of the dam. However, as this term is not known it is disregarded when computing the energy balance. Nonetheless, the approximated formula can cause the estimated inflow to reach negative values in times of water scarcity. Following the recommendation by ENTSO-E, the negative values are set to zero in the final modelled timeseries.

It must be noted that TP data has been cautiously used to compute inflow to reservoirs, since the stored energy data on the platform refers both to reservoirs and pumped storage technologies. Hence, the inflow results from the TP have been retained only in a few cases, generally where the reported installed capacity for reservoirs is much greater than the one for pumped storage.

2.9.4.11. Inflow to Run-of-rivers and Pondage

For Run-of-river systems, the storage term is considered null, and considering that the storage capacity of a pondage is less than 24 hours, the same is assumed for run-of-river with pondage at weekly resolution, hence reducing the equation to:

\[ IN_{nat} = \dfrac{E_{out}}{\eta_{p}} \]

When possible, the two technologies are kept separate. For instance, this is possible for the bidding zones whose TSO provided distinct generation time series. Data from the TP, on the other hand, are used to model run-of-river technology only in case no pondage was declared for that bidding zone by the TSO, nor was pondage available in the PECDv3.1 dataset. This to make sure that the sole run-of-river was being addressed, given the TP generation data includes both technologies (addressed as ‘Run-of-river and pondage’). If only run-of-river data were provided by the TSO for a given bidding zone, the run-of-river inflow was calculated starting from this data, while the pondage inflow was calculated starting from the PECDv3.1 data. Comments on these particular cases are left in the Summary Table (Table 2.11).

Finally, the same production efficiency is assumed for all technologies ( \( \eta_{p} \simeq 0.866 \) ), however, to align with the models used by ENTSO-E to ingest the energy data, the final inflow model outputs are multiplied back by the same efficiency coefficient to obtain an inflow at the electrical grid level. Although the balance equations should bring to close-to-reality estimates, it must be noted that not having access to actual inflow observations, it is not possible to fully validate the above methodology.

2.9.4.12. Use of PECDv3.1 inflow estimates

In case TSO and TP data were not sufficient to complete the inflow for a specific bidding zone and a specific technology, the PECDv3.1 inflow data were used directly as the target variable for the training of the RF model as indicated in Figure 2.37. This approach was especially used to model inflows to open-loop pumped storage as only a few stored energy time series were provided by the TSOs. Therefore, there are cases in which the generation is modelled starting from available TSO data, while the corresponding inflow (for the unavailability of stored energy data) is modelled starting from PECDv3.1 data, bringing up sometimes inconsistencies between the two datasets. The main ones are reported in the Summary Table (Table 2.11).

Figure 2.37: Sketch of the two different approaches to model inflows: approach 1 makes use of TSO and TP data, approach 2 makes use of PECDv3.1 data.

2.9.4.13. Post-hoc corrections following TSOs’ feedback

For the produced inflow datasets of some specific technologies and regions, a multiplicative correction factor was applied to the model outputs in agreement with the TSO of interest, after validation against a reference dataset. These correction factors were hence required due to the poor quality of the public data initially used for the model training and are to be regarded as temporary adjustments ahead of a more stable solution. See Table 2.10 for an overview of the explicit multiplicative values, and the regions to which these were applied for the PECDv4.2 delivery of data.

Table 2.10: Multiplicative Correction Factors applied to inflows model output.

Region	Technology	Correction Factor	Source
AT00	HRI – inflows to reservoirs	2404/5507	Comparison of mean maximum generation with an internal APG data source with strict sharing limitations.
	HRR – inflows to run of river	23082/17760
	HPI – inflows to pondage	5607/4506
CH00	HOL – inflows to open-loop pumped storage	0.825	Comparison of mean annual cumulated inflows with a reference monthly dataset derived from Swiss Federal Office of Energy (SFOE) data.
CH00	HRR – inflows to run of river	1.39	Comparison of mean annual cumulated inflows with a reference monthly dataset (SFOE). Mind: this factor was applied directly to the model input TSO data in accordance with the Swiss TSO.
TR00	HRR – inflows to run of river (and relative IC series)	2.502	Comparison of mean annual cumulated inflow with an internal series of annual cumulated generation for period 2019-2023 including all country plants.
TR00	HRI – inflows to reservoirs (and relative IC series)	1.850	Comparison of mean annual cumulated inflow with an internal series of annual cumulated generation for period 2019-2023 including all country plants.

2.9.4.14. Summary Table

Table 2.11 includes all addressed bidding zones and technologies (except for generation from run-of-river and pondage, which would be a repetition of the respective reported inflow columns) and can be used to check the availability of data, source of data used for the modelling, and comments on the results, mainly addressing inconsistencies found or considerations made for the source/modelling choices. As mentioned, the TSO generation data have always been given priority when available, followed by TP data and PECDv3.1 estimates. Given the different data sources and methodology used, the results can significantly differ from the ones of the previous PECD, therefore we strongly recommend checking with TSOs about the reliability of mean generation/inflow historical values.

Table 2.11: Summary table of used data sources and comments/considerations on the model outputs results.

	Reservoirs Generation	Inflow to Reservoirs	run-of-river Inflow	Inflow to Open Loop PS	Pondage Inflow
Bidding zone / Tech.	HRG	HRI	HRO	HOL	HPO
AL00	TSO rescaled using monthly IC	TSO rescaled using monthly IC	TSO rescaled using monthly IC
AT00	TSO	TP – the mean using PECDv3.1 data is too low with respect to TSO data, hence using TP data although SE is surely affected by HPS (Hydro Pumped Storage)	TSO	PECDv3.1	TSO
BA00	TSO	PECDv3.1	PECDv3.1- TSO run-of-river data not provided – might be already accounted for in TSO pondage data	PECDv3.1	TSO
BE00			TSO
BG00	TSO	TSO	TSO	TSO
CH00	TSO	TSO – rescaled using monthly IC	TSO - rescaled using monthly IC – multiplication factor of 1.39 applied to generation input data in accordance with CH00 TSO	TSO - rescaled using monthly IC
CZ00	TSO	PECDv3.1	TP (since there’s no pondage) – can reproduce mean signal, can’t well reproduce the peaks – suspected anthropic factors influencing the production after 2019	PECDv3.1
DE00	TSO	PECDv3.1 – mean too low with respect to TSO generation, should be ca three times higher	TSO	PECDv3.1
ES00	TSO	TSO	TSO	TSO
FI00	TSO	TSO	TP (no TSO pondage data, no PECDv3.1 pondage data)
FR00	TP	TP – HPS (pumped storage) IC about 60% of HRE (reservoirs) IC in past 8 years (from TP data) + time series very close to PECDv3.1 inflow	TP (no TSO data for FR, no pondage in PECDv3.1 data)	GPU (Generation Per Unit) - (no PECDv3.1 data for FR) - low reliability: no HOL storage energy available (approximated inflow assuming negligible storage from one week to the other) + few production and pumping data (3 years)
GR00	TSO	TSO	TSO – model training on last 4 years (missing monthly IC data to rescale) – significant difference with PECDv3.1 inflow	TSO	PECDv3.1 – even though no pondage data from TSO nor TP
HR00	TSO – very close to TP generation	TP – HPS IC about 20% of HRE IC in the past 9 years (TP data)	TSO – could contain pondage	PECDv3.1	PECDv3.1 – even though no pondage data from TSO.
HU00			TSO rescaled using monthly IC
IE00			TSO
ITCA	TSO	PECDv3.1 – reasonable values with respect to TSO generation	TSO
ITCN	TSO	PECDv3.1 – inflow sometimes lower than TSO generation	TSO
ITCS	TSO	PECDv3.1 – inflow very close to TSO generation	TSO	PECDv3.1
ITN1	TSO	PECDv3.1 – inflow very close to TSO generation	TSO	PECDv3.1
ITS1	TSO	PECDv3.1 – inflow close to generation (would expect it a bit higher)
ITSA	TSO	PECDv3.1 – high with respect to TSO generation	TSO
ITSI	TSO	PECDv3.1 – low peaks with respect to TSO generation	TSO	PECDv3.1
LT00			TSO – generation values exceptionally high for the year 2015 (something wrong in the data) -> left out of training
LV00					TSO
LU00			TSO
ME00	TSO – close to tp generation data, higher peaks	TP – no HPS IC	PECDv3.1
MK00	TSO	TSO
NL00			PECDv3.1
NOM1	TSO	TP – small HPS production compared to HRE	TSO	PECDv3.1
NON1	TSO	TP - no HPS	TSO
NOS1	TSO	TP – no HPS	TSO	-
NOS2	TSO	TP – trying splitting PECDv3.1 NOS0 data obtained similar result + small HPS production	TSO	PECDv3.1 (splitting PECDv3.1 NOS0 data according to mean TSO generation data for NOS2)
NOS3	TSO	TP - trying splitting PECDv3.1 NOS0 data obtained similar result + small HPS production	TSO	PECDv3.1 (splitting PECDv3.1 NOS0 data according to mean TSO generation data for NOS3)
PL00	TSO	PECDv3.1 – mean inflow value is 3-4 times higher than TSO generation (also TP-calculated mean is 3-4 times higher)	TSO - rescaled using monthly IC	PECDv3.1 – inflow seems to be too low considering TSO generation and pumping series: ca 200 MWh of inflow against 1200 MWh of generation (mean weekly values)
PT00	TSO	TSO	TSO – values seem low, tp and PECDv3.1 data ca 10 times higher than TSO data of run-of-river and HPO together	TSO - rescaled using monthly IC	TSO
RO00	TSO	PECDv3.1	TSO	PECDv3.1
RS00	TSO	PECDv3.1 – TP data significantly impacted by HPS	TSO
SE01	TSO	PECDv3.1
SE02	TSO	PECDv3.1
SE03	TSO	PECDv3.1
SE04	TSO	PECDv3.1
SI00	TSO	-	TSO – could contain pondage		PECDv3.1 – no pondage generation data from TSO: keeping PECDv3.1 trained estimates. Pondage could be included in run-of-river TSO data? In this case PECDv3.1 estimates are off.
SK00	TSO	PECDv3.1 – although mean is considerably higher than TSO generation	TSO	PECDv3.1	TSO
TR00
UK00			TP – (no TSO data for GB, no pondage in PECDv3.1 data)

2.10. Energy indicators

Energy indicators included in the PECDv4.2 dataset for the historical stream are described in Table 2.12. This table provides information for each variable, including the typology, the time period covered, the source of the input data, the domain and spatial resolution, the temporal resolution, the spatial aggregation (as specified in Table 2.1), and, where applicable, the different technologies used to compute the final time series.

For onshore wind power capacity factors, ten time series are computed: one for existing technologies and nine for future technologies, based on different combinations of Specific Power and Hub Height. Similarly, for offshore wind power capacity factors, three time series are computed: one for existing technologies and two for future technologies, also based on different combinations of Specific Power and Hub Height. For Concentrated Solar Photovoltaic capacity factors, four time series are computed, considering combinations of preDispatch and Dispatch technologies with 0 hours and 7 hours of storage.

Table 2.12: Energy indicators provided in the PECDv4.2 for the historical stream. Files provided at ORIG spatial aggregation are gridded (NetCDF format), while all the other levels of aggregation are provided in CSV format. Changes that were implemented in PECDv4.2 compared to the previous PECDv4.1 are highlighted in bold (specifically: extended time period, new energy scenarios for WPP locations, and new technologies for solar generation).

Variable	Type	Time period	Source	Domain/ spatial resolution	Temporal resolution	Spatial aggregation	Technology	Units
Wind power onshore (WON)	Capacity factor	1950 - near present	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	PEON	Onshore Existing technologies, Onshore SP199_HH100, Onshore SP199_HH150, Onshore SP199_HH200, Onshore SP277_HH100, Onshore SP277_HH150, Onshore SP277_HH200, Onshore SP335_HH100, Onshore SP335_HH150, Onshore SP335_HH200 Except for the Existing technologies, all future technologies are evaluated for two WPP locations: ReGrA (best 10% locations), and ReGrB (best 10-50% locations)	MW/MW
Wind power offshore (WOF)	Capacity factor	1950 - near present	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	PEOF	Offshore Existing technologies, Offshore SP316_HH155, Offshore SP370_HH155 Except for the Existing technologies, all future technologies are evaluated for two WPP locations: ReGrA (best 10% locations), and ReGrB (best 10-50% locations)	MW/MW
Solar generation (SPV)	Capacity factor	1950 - near present	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, PEON	Industrial rooftop, Residential rooftop, Utility-scale fixed, Utility-scale 1-axis tracking	MW/MWp
Concentrated solar generation (CSP)	Capacity factor	1950 - near present	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	PEON	storage_0_hours_preDispatch, storage_0_hours_storageDispatched, storage_7p5_hours_preDispatch, storage_7p5_hours_storageDispatched	MW/MW
Hydropower reservoirs generation energy (HRG)	Energy	1950 - near present	ERA5 reanalysis ENTSO-E TP* TSO**	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower reservoirs inflow energy (HRI)	Energy	1950 - near present	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river generation energy (HRO)	Energy	1950 - near present	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river inflow energy (HRR)	Energy	1950 - near present	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river with pondage generation energy (HPO)	Energy	1950 - near present	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river with pondage inflow energy (HPI)	Energy	1950 - near present	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower open-loop pumped storage inflow energy (HOL)	Energy	1950 - near present	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh

*Energy data from ENTSO-E Transparency Platform

**Energy data from Transmission System Operators specific for each country

***Inflow data from ENTSO-E PECDv3.1

2.11. Known issues

There are no known issues.

3. Projection stream

3.1. Projection models

3.1.1. Choice of models

The projection dataset in PECDv4.2 has been designed to provide robust climate and energy indicators for the entire PECD domain, extending up to the year 2100. As a first step in building this dataset, a careful selection of climate projections was carried out to identify the most appropriate subset for energy-sector applications.

The selection process was based on models from the Coupled Model Intercomparison Project Phase 6 (CMIP6), coordinated by the World Climate Research Programme (WCRP) under the World Meteorological Organization (WMO). CMIP6 is the latest phase of an international initiative to standardise and compare climate simulations, and its outputs have been extensively assessed in the IPCC Sixth Assessment Report (AR6)¹⁶ (Eyring et al., 2016).

In PECDv4.1, CMIP6 data were already preferred over the EURO-CORDEX¹⁷ ensemble due to the wider availability of long-term projections. The ensemble was limited to only three climate models and covered a shorter time horizon (up to 2065). In PECDv4.2, the ensemble now includes six CMIP6 models and three additional emission scenarios, extending the projection period to 2100.

However, a key limitation of CMIP6 models is their relatively coarse spatial resolution, typically around 100 km. To meet the spatial resolution target of 0.25° (matching ERA5), a statistical downscaling and bias adjustment procedure was applied.

Although over 40 CMIP6 models are publicly available, their archived variables, spatial resolution, and temporal frequency vary significantly across platforms such as the Earth System Grid Federation (ESGF) and the CDS. As such, a sub-selection process was required to identify models that:

Provide all required variables for PECD at appropriate spatial and temporal resolutions;
Offer a meaningful spread in future climate trajectories relevant to the energy sector.

Rather than relying on complex performance-based metrics to evaluate how well each model reproduces historical climate conditions, the selection was primarily guided by Equilibrium Climate Sensitivity (ECS) values. This criterion, also used in IPCC AR6, allows for selecting a representative ensemble that spans the range of projected climate sensitivity, including models with higher sensitivity to capture "low-likelihood, high-impact" futures. The selection also aimed to reduce redundancy by minimising model overlap (i.e., models developed with similar components or structures). The results are presented in Table 3.1.

Table 3.1: Models are colour-coded based on exclusion criteria: dark red indicates models that do not provide all required scenarios; orange highlights models with Equilibrium Climate Sensitivity (ECS) values outside the range assessed in the IPCC AR6; yellow marks models that share components with others in the ensemble. The models that were retained for PECDv4.2 are highlighted in bold.

	Model	ECS (°C)	Pre-industrial	Historical	SSP1-1.9	SSP1-2.6	SSP2-4.5	SSP3-7.0	SSP5-8.5
1	ACCESS-CM2	4.72	500	3		1	1	1	3
2	ACCESS-ESM1-5	3.87	900	10		3	3	3	10
3	AWI-CM-1-1-MR	3.16	500	5		1	1	5	1
4	BCC-CSM2-MR	3.04	600	3		1	1	1	1
	BCC-ESM1	3.26		3
5	CAMS-CSM1-0	2.29	500	2	x	2	2	2	2
6	CanESM5	5.62	1000	40	x	40	40	40	40
7	CanESM5-CanOE		501	3		3	3	3	3
8	CESM2	5.16	1200	11		3	3	3	3
9	CESM2-FV2	5.14	500	3		XXXX	XXXX	XXXX	XXXX
10	CESM2-WACCM	4.75	499	3		1	3	1	3
11	CESM2-WACCM-FV2	4.79	500	3		XXXX	XXXX	XXXX	XXXX
12	CMCC-CM2-SR5	3.52	500	1		1	1	1	1
	CMCC-ESM2		500	1		1	1	1	1
13	CNRM-CM6-1	4.83	500	30		6	6	6	6
	CNRM-CM6-1-HR	4.28	XXXX	1		1	1	1	1
14	CNRM-ESM2-1	4.76	500	9	x	5	5	5	5
15	E3SM-1-0	5.32	500	5		XXXX	XXXX	XXXX	XXXX
	E3SM-1-1-ECA		XXXX	1		XXXX	XXXX	XXXX	XXXX
	E3SM-1-1		XXXX	1		XXXX	XXXX	XXXX	1
	EC-Earth3	4.10	XXXX	23	x	7	22	7	7
16	EC-Earth3-Veg	4.31	500	5	x	4	5	4	4
	EC-Earth3-Veg-LR		XXXX	1	x	XXXX	XXXX	XXXX	1
17	FGOALS-f3-L	3.00	561	3		1	1	1	1
18	FGOALS-g3	2.88	700	6	x	1	1	1	4
	FIO-ESM-2-0		XXXX	3		3	3	XXXX	3
19	GFDL-CM4	3.89	500	1		XXXX	1	XXXX	1
20	GFDL-ESM4	2.60	500	3	x	1	3	1	1
21	GISS-E2-1-G	2.72	851	39	x	2	15	2	7
	GISS-E2-1-G-CC		XXXX	1		XXXX	XXXX	XXXX	XXXX
	GISS-E2-1-H	3.11	XXXX	1	x	XXXX	XXXX	XXXX	XXXX
22	HadGEM3-GC31-LL	5.55	500	4		1	1	XXXX	3
23	HadGEM3-GC31-MM	5.42	500	2		1	XXXX	XXXX	3
24	INM-CM4-8	1.83	531	1		1	1	1	1
25	INM-CM5-0	1.92	1201	10		1	1	5	1
26	IPSL-CM6A-LR	4.56	1200	32	x	6	11	11	6
	KACE-1-0-G	4.48	XXXX	3		2	3	3	3
27	MCM-UA-1-0	3.65	500	2		1	1	1	1
28	MIROC-ES2L	2.68	500	10	x	2	1	1	10
29	MIROC6	2.61	800	10	x	3	3	3	3
30	MPI-ESM-1-2-HAM	2.96	780	2		XXXX	XXXX	XXXX	XXXX
31	MPI-ESM1-2-HR	2.98	500	10		2	2	10	2
32	MPI-ESM1-2-LR	3.00	1000	10		10	10	10	10
33	MRI-ESM2-0	3.15	701	5	x	1	1	5	2
34	NESM3	4.72	500	5		2	2	XXXX	2
35	NorCPM1	3.05	500	30		XXXX	XXXX	XXXX	XXXX
	NorESM2-LM	2.54	XXXX	3		1	3	1	1
36	NorESM2-MM	2.50	500	1		1	1	1	1
37	SAM0-UNICON	3.72	700	1		XXXX	XXXX	XXXX	XXXX
38	TaiESM1	4.31	500	1		1	1	1	1
39	UKESM1-0-LL	5.34	1100	17	x	5	5	5	5

This list was further refined by considering additional criteria, specifically the availability of sufficient temporal resolution, with a minimum requirement of 3-hourly data, and horizontal spatial resolution, with a minimum requirement of 100 km. These criteria ensure the data is sufficiently detailed for further processing and analysis. Note that the model MPI-ESM1-2-HR was selected despite being initially excluded, due to its higher spatial resolution compared to the previously preferred model from the same "family", MPI-ESM1-2-LR. Initially, the high-resolution version of that model (HR) was not yet available for analysis and that is why it was excluded. However, it became available just in time for the present analysis and was therefore included.

Concerning the Shared Socio-economic Pathways - SSPs (IPCC, 2021)¹⁸ , the analysis involved four scenarios: SSP1-2.6, SSP2-4.5, SSP3-7.0 and SSP5-8.5. These four scenarios explore possible climate futures based on greenhouse gas emission levels: low (SSP1-2.6), present-day (SSP2-4.5), medium (SSP3-7.0), and high (SSP5-8.5) emissions. This selection captures a broad spectrum of future climate uncertainty. For simplicity, these scenarios are referred to in this documentation as SSP126, SSP245, SSP370, and SSP585.

The final selection of models and their characteristics is reported in Table 3.2.

Table 3.2: CMIP6 models used in the projections stream and their corresponding characteristics and nodes for downloading.
The models and scenarios indicated in bold are the ones that have been introduced in PECDv4.2, while the other ones were already present in PECDv4.1.

Model	Originator	Model code	Variant label	Calendar	Scenarios	node URL
CMCC-CM2-SR5	CMCC (Centro Euro-Mediterraneo sui Cambiamenti Climatici)	CMR5	r1i1p1f1	365_day	historical, SSP126, SSP245, SSP370, SSP585	https://esgf-data.dkrz.de/esg-search
EC-Earth3	ECEC (European community Earth System Model)	ECE3	r1i1p1f1	proleptic_gregorian	historical, SSP126, SSP245, SSP370, SSP585	https://esg-dn1.nsc.liu.se/esg-search
MPI-ESM1-2-HR	MPI- (Max Planck Institute)	MEHR	r1i1p1f1	proleptic_gregorian	historical, SSP126, SSP245, SSP370, SSP585	https://esgf-data.dkrz.de/esg-search
BCC-CSM2-MR	Beijing Climate Center (BCC)	BCCS	r1i1p1f1	365_day	historical, SSP126, SSP245, SSP370, SSP585	https://esgf-data.dkrz.de/esg-search
AWI-CM-1-1-MR	Alfred Wegener Institute (AWI)	AWCM	r1i1p1f1	proleptic_gregorian	historical, SSP126, SSP245, SSP370, SSP585	https://esgf-data.dkrz.de/esg-search
MRI-ESM2-0	Meteorological Research Institute (MRI)	MRM2	r1i1p1f1	proleptic_gregorian	historical, SSP126, SSP245, SSP370, SSP585	https://esgf-data.dkrz.de/esg-search

Note that the historical simulation period is chosen to ensure overlap between ERA5 and the CMIP6 models, enabling the computation of bias adjustment.

3.2. Data retrieval

CMIP6 variables (for each model) are downloaded from the ESGF node using a Python script that utilises a specific Python API. The script only accepts a configuration file as an argument, which contains the desired tags for the download. This script is used for downloading both historical and projection data. Table 3.2 lists the nodes from which each model has been downloaded. The selected CMIP6 climate models are also available in the C3S catalogue, however the high temporal resolution (namely, 3 hourly) needed to produce the PECD database was not available at the C3S. For this reason, the CMIP6 model output have been collected via the ESGF nodes.

3.3. Spatial interpolation

Starting from a common 100 km nominal spatial resolution and global domain, each model has its own grid, necessitating spatial interpolation to the PECD domain at 0.25° x 0.25°. This interpolation uses the bilinear method as implemented in the CDO¹⁹ remapbil command line tool.

The interpolation process involves the following command:

\[ cdo \hspace{2mm} remapbil,<grid.txt> \hspace{2mm} <filein.nc> \hspace{2mm} <fileout.nc> \]

where grid.txt is a text file obtained by running: \[ cdo \hspace{2mm} griddes \hspace{2mm} <template.nc> \hspace{2mm} > \hspace{2mm} <grid.txt> \] on a template file with the desired grid. Here filein.nc is the original projection file, and fileout.nc is the spatially interpolated output.

A Python script iterates over the files and, using the os library, calls the CDO command line for each file. Another Python script in the pre-processing pipeline checks the output files for missing (Not A Number, NaN) and anomalous values, and reformats them according to ERA5 conventions.

https://code.mpimet.mpg.de/projects/cdo

3.4. Temporal aggregation and interpolation

As stated in Section 3.1, one of the selection criteria for projection models is the finest available temporal resolution (3 hours). However, it is necessary to apply temporal interpolation to achieve the required hourly resolution for the PECDv4.2 database. Table 3.3 shows the method used to temporally interpolate each variable.

Table 3.3: Temporal interpolation methodologies.

Variable	Interpolation method
Temperature at 2m (TA)	(1) Cubic spline with moving window (window width 3 days)
Precipitation (TP)	(3) Cumulating over the days
Surface solar radiation downwards (GHI)	(2) Method ad hoc for taking into account the position of the sun
Wind speed at 10 m (WS10)	(1) Cubic spline with moving window (window width 3 days) apply separately at the 10 m horizontal components of wind)

The cubic spline interpolation is implemented in a Python script that uses the xarray library. The set of files is opened in an xarray.mfdataset (multi-file dataset), and an iterator runs along the "time" coordinates of the 3-hourly file on a daily step starting from 00:00 hours. In each step, a window with a width of 3 days is created, and the data within the window are interpolated to an hourly resolution for each grid point by combining the Xarray methods resample("1h") and interpolate("cubic"). The interpolated data for the central day (from 00:00 to 23:00) are then stored in a new dataset and saved as a NetCDF file.

It is important to note that to obtain files according to the ERA5 conventions and to have the first hour as 00:00 for the projections, it is necessary to use the last day of the historical scenario, considering that the different SSP scenarios start from 03:00. Figure 3.1 contains a validation of this method considering the TA variable at a generic point of the PECD domain.

Figure 3.1: Validation of the temporal interpolation procedure based on spline cubic algorithm for TA.

For surface solar radiation downwards, the first step involves converting the three-hourly data to a clearness index (Kt), removing its dependence on variations in the Sun's apparent position. This is done by dividing the GHI by its equivalent irradiation at the top of the atmosphere (TOA), which is calculated using an algorithm from the Solar Geometry 2 (SG2) library:²⁰ The SG2 library can be installed via "pip" in any Python environment. The detrended Kt time series is then downscaled to an hourly resolution using linear interpolation. The data is subsequently reconverted to GHI by multiplying it with an hourly-averaged TOA value. Figure 3.2 shows a validation plot for this procedure, computed at a generic point within the PECD domain.

Figure 3.2: Validation of the temporal interpolation procedure based on the SG2 algorithm for GHI. TOA stands for top-of-the-atmosphere solar radiation, and SRDS stands for incoming solar radiation and is equivalent to GHI.

The required variable for precipitation is total precipitation (TP), which has been derived from the precipitation flux (in kg m⁻² s⁻¹), the original data format for CMIP6 projection models. Since energy models require daily cumulative data, the downloaded precipitation flux data was first resampled to daily averages using the xarray.resample().mean() method. This daily average was then multiplied by 86.4 to convert the data into daily precipitation in meters.

https://github.com/gschwind/sg2

3.5. Bias-adjustment procedure

Concerning the projection streamflow, two bias adjustment methodologies have been implemented for the CMIP6 projection datasets. These methodologies are:

Cumulative Distribution Function Transform (CDFt): As described by Michelangeli et al. (2009), this method assumes the existence of a transformation that can 'translate' a time series of a CMIP6 variable (the predictor) into the CDF representing the reference climate variable (the predictand) at a given point.
Delta Adjustment: As detailed by Navarro-Racines et al. (2020), this method applies a simple constant correction based on the averages of the predictor and the predictand.

The calibration period for these methodologies extends from 1995 to 2014 (20 years) for the CDFt method. The calibration time-series are retrieved from the ERA5 dataset and the historical scenario of the CMIP6 projection model dataset (reference and source variable to be corrected, respectively).

Delta Adjustment: This computationally low-expensive method is used for variables that do not exhibit a strong climate-change-related trend, such as solar radiation.

CDFt Method: This method is used for variables with a strong climate-change-related trend, such as temperature. To correctly account for the trend, a 20-year time series is considered for the calculation of the CDFs, with only the central 10-year window taken as the adjusted data. The 20-year timeframe is then moved forward, yielding a new 10-year central window that partially overlaps the window of the previous step. Despite wind speed and precipitation (WS10 and TP) not exhibiting a strong climate change trend, their correction is also based on the CDFt method. This is because the mean factors in the Delta method could potentially lead to negative (and therefore unphysical) values. For these variables, given the lack of a strong climatic trend, the CDFt considers a ‘static’ 20-year time series.

Figures 3.3, 3.4 and 3.5 illustrate the logic blocks of the bias-adjustment procedure applied to the 2m temperature (TA), total daily precipitation (TP), 10 m wind speed (WS10), and to the surface solar radiation (GHI), respectively.

Figure 3.3: Details of the bias-adjustment logic block for the projection 2m temperature (TA) using the Cumulative Distribution Function transform (CDFt) method with the moving-windows technique.

Figure 3.4: Details of the bias-adjustment logic block for the projection 10 m wind speed (WS10) and total daily precipitation (TP) using the Cumulative Distribution Function transform (CDFt) method without the moving-windows technique.

Figure 3.5: Details of the bias-adjustment logic block for the projection global horizontal irradiance (GHI) using the Delta Adjustment method.

3.6. Climate indicators

Table 3.4 lists the climate indicators for the projection stream. The final domain and spatial resolution, as well as the final temporal resolution, are obtained through preprocessing as described in Section 3.3 and Section 3.4, respectively. The bias adjustment has been applied using the procedures detailed in Section 3.5. Since wind speed at 100 m above the ground is not available for the CMIP6 projection models, and to maintain consistency between the wind speed at 100 m in the historical (ERA5) and the projection datasets, the wind speed at 100 m is calculated using the near-surface (10m) wind speed of the CMIP6 projection models together with the Alpha Coefficient (or power law) derived from the ERA5 reanalysis (see Section 2.2 for more details). For the projection stream, the computation of TAW and the spatial aggregation follow the same methodologies described for the historical stream (see Sections 2.4 and 2.5, respectively). It is important to note that all variables are bias-adjusted except for TAW and WS100, because they are both derived from bias-adjusted variables (TA and WS10, respectively).

Table 3.4: Climate indicators provided in the PECDv4.2 for the projection stream. Files provided at the BIAS spatial aggregation level (specifically, bias-adjusted data; see Table 2.1 for further info) are gridded (NetCDF format), while all the other levels of aggregation are provided in a CSV format. Changes that were implemented in PECDv4.2 are highlighted in bold (extended time period, additional climate projection models and climate scenarios, new spatial aggregation over selected cities).

Variable	Period	Source	Models	Scenario	Domain/ spatial resolution	Temporal resolution	Spatial aggregation	Units
2m temperature (TA)	2015-2100	CMIP6 projections	AWCM, BCCS, CMR5, ECE3, MEHR, MRM2	SSP126, SSP245, SSP370, SSP585	PECD/0.25° x 0.25°	hourly	BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF, CITY	K (gridded) °C (aggregated)
Population-weighted temperature (TAW)	2015-2100	CMIP6 projections	AWCM, BCCS, CMR5, ECE3, MEHR, MRM2	SSP126, SSP245, SSP370, SSP585	PECD/0.25° x 0.25°	hourly	SZON	°C
Total precipitation (TP)	2015-2100	CMIP6 projections	AWCM, BCCS, CMR5, ECE3, MEHR, MRM2	SSP126, SSP245, SSP370, SSP585	PECD/0.25° x 0.25°	daily	BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m
Surface solar radiation downwards (GHI)	2015-2100	CMIP6 projections	AWCM, BCCS, CMR5, ECE3, MEHR, MRM2	SSP126, SSP245, SSP370, SSP585	PECD/0.25° x 0.25°	hourly	BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	W m^-2
10m wind speed (WS10)	2015-2100	CMIP6 projections	AWCM, BCCS, CMR5, ECE3, MEHR, MRM2	SSP126, SSP245, SSP370, SSP585	PECD/0.25° x 0.25°	hourly	BIAS, BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m s^-1
100m wind speed (WS100)	2015-2100	CMIP6 projections	AWCM, BCCS, CMR5, ECE3, MEHR, MRM2	SSP126, SSP245, SSP370, SSP585	PECD/0.25° x 0.25°	hourly	BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m s^-1

3.7. Energy data

The same data illustrated in Section 2.7 are also used for the projection stream.

3.8. Energy Conversion models

3.8.1. Wind Power Conversion Model

The climate data used as input are listed in Table 3.4, and the procedure is the same as described in Section 2.9.1.

The simulated locations and wind technologies depend on the type of run. An overview of the runs is given in Table 3.5.

Table 3.5: Wind run types for the projection stream. Changes that were implemented in PECDv4.2 are highlighted in bold (specifically, the extended time period).

Run type

Climate projection simulated years

WPP locations

WPP technology

Losses

Existing

2015-2100

All years with 2020 WPP locations (based on WindPowerNet data)

Existing WPP parameters based on WindPowerNet data (always 2020 fleet), applied in the generic power curve model

Wakes as part of the generic power curve. And 10 % for other losses (incl. unavailability), applied as a simple multiplication by 0.9

Future wind technologies

2015-2100

The best 10-50 % locations of the unmasked points within each PECD region (in terms of mean wind speed in the bias-adjusted ERA5 data, based on ERA5 grid).

Onshore wind: 3 hub heights and 3 turbine types, so in total 9 wind technologies. A plant of 50 MW with ten 5 MW turbines modelled for each technology.

Offshore wind: 1 hub height and 2 turbine types, so in total 2 wind technologies. A plant of 500 MW with 28 18 MW turbines modelled for each technology.

Wakes as part of power curves. And 5 % for other losses (incl. unavailability), applied as a simple multiplication by 0.95

3.8.2. Photovoltaic Solar Power Conversion Model

The climate data used as input are listed in Table 3.4, and the procedure is the same as described in Section 2.9.2.

3.8.3. Concentrated Solar Power Conversion Model

The climate data used as input are listed in Table 3.4, and the procedure is the same as described in Section 2.9.3.

3.8.4. Hydro Power Conversion Model

The climate data used as input are listed in Table 3.4, and the procedure is the same as described in Section 2.9.4.

3.9. Energy indicators

For the projection stream, the same energy indicators described for the historical stream (see Section 2.10) were computed starting from the climate indicators listed in Table 3.4.

Table 3.6 summarizes the energy indicators and provides detailed information for each variable, including the type, the covered time period, the source of the input data, the domain and spatial resolution, the temporal resolution, the spatial aggregation (according to Table 2.1), and, where applicable, the different technologies used to compute the final time series.

Table 3.6: Energy indicators provided in the PECDv4.2 for the projection stream. Files provided at ORIG spatial aggregation are gridded (NetCDF format), while all the other levels of aggregation are provided in CSV format. Changes that were implemented in PECDv4.2 compared to the previous PECDv4.1 are highlighted in bold (specifically: extended time period, new energy scenarios for WPP locations, and new technologies for solar generation).

Variable	Type	Period	Source	Domain/ spatial resolution	Temporal resolution	Spatial aggregation	Technology	Units
Wind power onshore (WON)	Capacity factor	2015 - 2100	CMIP6 projection	PECD/0.25° x 0.25°	hourly	PEON	Onshore Existing technologies, Onshore SP199_HH100, Onshore SP199_HH150, Onshore SP199_HH200, Onshore SP277_HH100, Onshore SP277_HH150, Onshore SP277_HH200, Onshore SP335_HH100, Onshore SP335_HH150, Onshore SP335_HH200 Except for the Existing technologies, all future technologies are evaluated for two WPP locations: ReGrA (best 10% locations), and ReGrB (best 10-50% locations)	MW/MW
Wind power offshore (WOF)	Capacity factor	2015 - 2100	CMIP6 projection	PECD/0.25° x 0.25°	hourly	PEOF	Offshore Existing technologies, Offshore SP316_HH155, Offshore SP370_HH155 Except for the Existing technologies, all future technologies are evaluated for two WPP locations: ReGrA (best 10% locations), and ReGrB (best 10-50% locations)	MW/MW
Solar generation (SPV)	Capacity factor	2015 - 2100	CMIP6 projection	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, PEON	Industrial rooftop, Residential rooftop, Utility-scale fixed, Utility-scale 1-axis tracking	MW/MWp
Concentrated solar generation (CSP)	Capacity factor	2015 - 2100	CMIP6 projection	PECD/0.25° x 0.25°	hourly	PEON	storage_0_hours_preDispatch, storage_0_hours_storageDispatched, storage_7p5_hours_preDispatch, storage_7p5_hours_storageDispatched	MW/MW
Hydropower reservoirs generation energy (HRG)	Energy	2015 - 2100	CMIP6 projection ENTSO-E TP* TSO**	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower reservoirs inflow energy (HRI)	Energy	2015 - 2100	CMIP6 projection ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river generation energy (HRO)	Energy	2015 - 2100	CMIP6 projection ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river inflow energy (HRR)	Energy	2015 - 2100	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river with pondage generation energy (HPO)	Energy	2015 - 2100	CMIP6 projection ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river with pondage inflow energy (HPI)	Energy	2015 - 2100	CMIP6 projection ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower open-loop pumped storage inflow energy (HOL)	Energy	2015 - 2100	CMIP6 projection ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh

*Energy data from ENTSO-E Transparency Platform

**Energy data from Transmission System Operators specific for each country

**Inflow data from ENTSO-E PECDv3.1

4. Appendix

4.1. Filenames convention and characteristics

This paragraph aims to explain the filename convention of the PECD datasets. Table 4.1 details the structure and possible fields of the filenames. Specifically, the last column indicates the corresponding section of the CDS catalogue where users can personalize their choice. If "Not applicable" is indicated, it means that the user cannot modify this field, and the data are downloaded with fixed characteristics that are not customizable. Table 4.2 details the structure and filenames of the ancillary NetCDF files that have been used for PECDv4.2 and that are available in the CDS under the widget "Weights and masks".

Table 4.1: Filename convention used in the PECDv4.2.

Position in the filename	Possible substrings for each position in the filename	Description	Option in the CDS download form
0	H (Historical), P (Future projection)	Temporal period covered	Temporal period
1	ERA5 (ERA5 reanalysis), CMI6 (CMIP6 Projection)	Data source	Origin (Reanalysis or Climate models)
2	ECMW (ECMWF), CMCC (Centro Euro-Mediterraneo sui Cambiamenti Climatici), ECEC (European Community Earth System Model), MPI- (Max Planck Institute), AWI- (Alfred Wegener Institute), BCC- (Beijing Climate Center), MRI- (Meteorological Research Institute)	Climate producing center	Origin (Reanalysis or Climate models)
3	T639 (ERA5 reanalysis), CMR5 (CMCC-CM2-SR5 r1i1p1f1), ECE3 (EC-Earth3 r1i1p1f1), MEHR (MPI-ESM1-2-HR r1i1p1f1), AWCM (AWI-CM-1-1-MR r1i1p1f1), BCCS (BCC-CSM2-MR r1i1p1f1), MRM2 (MRI-ESM2-0 r1i1p1f1)	Climate model	Origin (Reanalysis or Climate models)
4	WS- (10m wind speed and 100m wind speed), TA- (2m temperature), TAW (Population-weighted temperature), GHI (Surface solar radiation downwards), TP- (Total precipitation)	Climate variable	Variable (Climate)
4	HOL (Hydropower open-loop pumped storage inflow), HRG (Hydropower reservoirs generation), HRI (Hydropower reservoirs inflow), HRO (Hydropower run-of-river generation), HRR (Hydropower run-of-river inflow), HPO (Hydropower run-of-river with pondage generation), HPI (Hydropower run-of-river with pondage inflow), CSP (Concentrated solar generation capacity factor), SPV (Solar Photovoltaic generation capacity factor), WON (Wind power onshore capacity factor), WOF (Wind power offshore capacity factor)	Energy variable	Variable (Energy)
5	0000m, 0002m, 0010m, 0100m	Level (meters above sea level)	Not applicable
6	Pecd (ENTSO-E PECD domain)	Region	Not applicable
7	025d (0.25°), NUT0 (NUTS 0), NUT2 (NUTS 2), PEOF (Pan-European Offshore Zones), PEON (Pan-European Onshore Zones), SZOF (Offshore Bidding Zones), SZON (Onshore Bidding Zones), CITY (list of selected cities, only for TA)	Spatial resolution	Spatial resolution (Gridded or Regional aggregated timeseries)
8	SYYYYMMDDhhmm (starting year, month, day, hour, minute)	Start date	Year Month
9	EYYYYMMDDhhmm (ending year, month, day, hour, minute)	End date	Year Month
10	ACC (accumulated), INS (Instantaneous), CFR (Capacity factor), NRG (Energy)	Type of data	Not applicable
11	MAP (gridded data), TIM (time series)	Data structure/typology	Not applicable
12	01h (1 hour), 01d (1 day), 07d (7 days)	Temporal resolution	Not applicable
13	NA-, COM (complete file, yearly), ITE (interim file, monthly)	Information on the completeness of the file	Not applicable
14	noc (no correction), cdf (Cumulative distribution fn), mbc (mean bias correction)	Bias adjustment method	Not applicable
15	NA-, org (original data), avg (mean)	Statistics	Not applicable
16	NA, 20 (Offshore wind turbine: Existing technologies), 21 (Offshore wind turbine: SP316 HH155), 22 (Offshore wind turbine: SP370 HH155), 30 (Onshore wind turbine: Existing technologies), 31 (Onshore wind turbine: SP199 HH100), 32 (Onshore wind turbine: SP199 HH150), 33 (Onshore wind turbine: SP199 HH200), 34 (Onshore wind turbine: SP277 HH100), 35 (Onshore wind turbine: SP277 HH150), 36 (Onshore wind turbine: SP277 HH200), 37 (Onshore wind turbine: SP335 HH100), 38 (Onshore wind turbine: SP335 HH150), 39 (Onshore wind turbine: SP335 HH200), 40 (Concentrated solar power: Pre-dispatch, no storage), 41 (Concentrated solar power: Dispatched, no storage), 42 (Concentrated solar power: Pre-dispatch, 7-hours of storage), 43 (Concentrated solar power: Dispatched, 7-hours of storage), 60 (Solar photovoltaic: Industrial rooftop), 61 (Solar photovoltaic: Residential rooftop), 62 (Solar photovoltaic: Utility-scale fixed), 63 (Solar photovoltaic: Utility-scale 1-axis tracking)	Technological specification	Technological specification (Offshore wind turbine, Onshore wind turbine, Concentrated solar power, Solar photovoltaic)
17	NA---, SP126 (SSP1-2.6), SP245 (SSP2-4.5), SP370 (SSP3-7.0), SP585 (SSP5-8.5)	IPCC emission scenario	Emission scenario
18	NA---, ReGrA (Resource grade A best 10% locations), ReGrB (Resource grade B best 10-50% locations)	Best locations for future wind installations	Energy scenario
19	NA---, StRnF (Statistical model/Random Forests), PhM01 (Physical Model/method1), PhM03 (Physical Model/method3), PhM04 (Physical Model/method4)	Transfer function	Not applicable
20	PECD4.2	Version of PECD database	PECD version (PECDv4.1 or PECDv4.2)
21	fv1	File version	Not applicable
22	.nc (NetCDF) .csv (comma-separated values)	File formats	Not applicable

Example of filename: H_ERA5_ECMW_T639_TP-_0000m_Pecd_025d_S198501010000_E198512312300_ACC_MAP_01h_COM_noc_org_NA_NA---_NA---_NA---_PECD4.2_fv1.nc

This NetCDF file (.nc) contains historical data (H) from ERA5 reanalysis (ERA5 and 7639) originated by ECMWF (ECMW); the variable is total precipitation (TP-) at 0m height (0000m), the coverage is PECD domain (Pecd) with a 0.25° spatial resolution (025d). Data span from 01/01/1985 at 00:00 UTC (S198501010000) to 31/12/1985 at 00:00 UTC (E198512310000). The data are accumulated (ACC), gridded (MAP), with an hourly temporal resolution (01h). The file is a complete (COM) annual file. The lead time is not available (NA-), data are not bias-corrected (noc), and they are original (org). The ensemble number, emission scenario, energy scenario and transfer function are not available (NA_NA---_NA---_NA---). The PECD version is 4.2 (PECD4.2) while the file version is fv1.

Table 4.2: Filename convention for ancillary data used in the PECDv4.2 and that are available in the CDS under the widget "Variable" ("Weights and masks").

Position in the filename	Possible substrings for each position in the filename	Description	Option in the CDS download form
0	ANCI (Ancillary)	Category	Not applicable
1	CITY(CITY-coords), LAT-mask (Latitude weights), NUT0-mask (NUT0 regions mask), NUT2-mask (NUT2 regions mask), PEOF-mask, (PEOF regions mask), PEON-mask (PEON regions mask), POP-mask (Population density mask), ALP-coef (Power law coefficients), PVM-mask (Solar PV mask), SZOF-mask (SZOF regions mask), SZON-mask (SZON regions mask), WPM-mask (Wind power regions mask), WS10E5-mean (Climatology of ERA5 10 m wind speed), WS10G2-mean (Climatology of GWA2 10 m wind speed), WS100E5-mean (Climatology of ERA5 100 m wind speed), WS100G2-mean (Climatology of GWA2 100 m wind speed)	Masks and weights used for the calculation of PECD	Variable (Weights and masks)
2	PECD4.2	Version of PECD database	Not applicable
3	fv1	File version	Not applicable
4	.nc (NetCDF) .csv (CSV file)	File formats	Not applicable

Example of a filename for the ancillary data: ANCI_LAT-mask_PECD4.2_fv1.nc

This NetCDF file (.nc) contains ancillary data (ANCI) used to adjust the gridded data with the proper latitudinal weights (LAT-mask) during the spatial aggregation procedure. The PECD version is 4.2 (PECD4.2) and the file version is fv1.

Table 4.3: Description of the ancillary NetCDF data and their characteristics. These files are available for download in the CDS under the widget "Variable" ("Weights and masks").

Filename	Variable	Grid	Description	Corresponding name in the widget "Weights and masks"
ANCI_CITY-coords_PECD4.2_fv1.csv	-	-	List of cities and their corresponding coordinates. See Section 2.5.1 for more details.	City coordinates
ANCI_LAT-mask_PECD4.2_fv1.nc	lat_weights(latitude, longitude)	PECD domain (latitude, longitude)	Each grid cell contains the cosine of the latitude for the corresponding grid cell. See Section 2.5.3 for more details.	Latitude weights
ANCI_SZON-mask_PECD4.2_fv1.nc	mask(region, latitude, longitude)	PECD domain (latitude, longitude) level (region)	For each level (region in SZON), every grid cell contains a floating point value between 0 and 1. A value of 0 indicates that the grid cell is outside the region, while a value of 1 means the cell is fully within the region. In other cases, the value represents the fraction of the grid cell’s area that lies within the region. See Section 2.5.2 for more details.	SZON regions mask
ANCI_SZOF-mask_PECD4.2_fv1.nc	mask(region, latitude, longitude)	PECD domain (latitude, longitude) level (region)	For each level (region in SZOF), every grid cell contains a floating point value between 0 and 1. A value of 0 indicates that the grid cell is outside the region, while a value of 1 means the cell is fully within the region. In other cases, the value represents the fraction of the grid cell’s area that lies within the region. See Section 2.5.2 for more details.	SZOF regions mask
ANCI_PEON-mask_PECD4.2_fv1.nc	mask(region, latitude, longitude)	PECD domain (latitude, longitude) level (region)	For each level (region in PEON), every grid cell contains a floating point value between 0 and 1. A value of 0 indicates that the grid cell is outside the region, while a value of 1 means the cell is fully within the region. In other cases, the value represents the fraction of the grid cell’s area that lies within the region. See Section 2.5.2 for more details.	PEON regions mask
ANCI_PEOF-mask_PECD4.2_fv1.nc	mask(region, latitude, longitude)	PECD domain (latitude, longitude) level (region)	For each level (region in PEOF), every grid cell contains a floating point value between 0 and 1. A value of 0 indicates that the grid cell is outside the region, while a value of 1 means the cell is fully within the region. In other cases, the value represents the fraction of the grid cell’s area that lies within the region. See Section 2.5.2 for more details.	PEOF regions mask
ANCI_NUT0-mask_PECD4.2_fv1.nc	mask(region, latitude, longitude)	PECD domain (latitude, longitude) level (region)	For each level (region in NUT0), every grid cell contains a floating point value between 0 and 1. A value of 0 indicates that the grid cell is outside the region, while a value of 1 means the cell is fully within the region. In other cases, the value represents the fraction of the grid cell’s area that lies within the region. See Section 2.5.2 for more details.	NUTS 0 regions mask
ANCI_NUT2-mask_PECD4.2_fv1.nc	mask(region, latitude, longitude)	PECD domain (latitude, longitude) level (region)	For each level (region in NUT2), every grid cell contains a floating point value between 0 and 1. A value of 0 indicates that the grid cell is outside the region, while a value of 1 means the cell is fully within the region. In other cases, the value represents the fraction of the grid cell’s area that lies within the region. See Section 2.5.2 for more details.	NUTS 2 regions mask
ANCI_WPM-mask_PECD4.2_fv1.nc	m_rest(latitude, longitude)	PECD domain (latitude, longitude)	Each grid cell contains a boolean value: 1 indicates that the cell is unsuitable for potential future wind power installations, while 0 indicates that the cell could potentially be used as a site for such installations. See Section 2.8 for more details.	Wind power regions mask
ANCI_PVM-mask_PECD4.2_fv1.nc	PVmask(latitude, longitude)	PECD domain (latitude, longitude)	Each grid cell contains a boolean value: 1 indicates that the cell is unsuitable for potential future solar photovoltaic power installations, while 0 indicates that the cell could potentially be used as a site for such installations. See Section 2.8 for more details.	Solar PV mask
ANCI_ALP-coef_PECD4.2_fv1.nc	alpha(time, latitude, longitude)	PECD domain (latitude, longitude) levels (time)	For each level (time), every grid cell contains the power law's alpha coefficient. Each grid cell contains in total 1224 alpha coefficients, one for each month of the year and each hour of the day. See Section 2.2.1* for more details.	Power law coefficients
ANCI_POP-mask_PECD4.2_fv1.nc	population_mask(latitude, longitude)	PECD domain (latitude, longitude)	Each grid cell contains the number of people living in that area. See Section 2.4.1 for more details.	Population density mask
ANCI_WS10G2-mean_PECD4.2_fv1.nc	ws10(latitude, longitude)	PECD domain (latitude, longitude)	Each grid cell contains the mean value of the 10 m wind speed from GWA2 computed over the period 2006-2018. This file is used in the bias adjustment procedure described in Section 2.3.3.	Climatology of GWA2 10 m wind speed
ANCI_WS10E5-mean_PECD4.2_fv1.nc	ws10(latitude, longitude)	PECD domain (latitude, longitude)	Each grid cell contains the mean value of the 10 m wind speed from ERA5 computed over the period 2006-2018. This file is used in the bias adjustment procedure described in Section 2.3.3.	Climatology of ERA5 10 m wind speed
ANCI_WS100G2-mean_PECD4.2_fv1.nc	ws100(latitude, longitude)	PECD domain (latitude, longitude)	Each grid cell contains the mean value of the 100 m wind speed from GWA2 computed over the period 2006-2018. This file is used in the bias adjustment procedure described in Section 2.3.3.	Climatology of GWA2 100 m wind speed
ANCI_WS100E5-mean_PECD4.2_fv1.nc	ws100(latitude, longitude)	PECD domain (latitude, longitude)	Each grid cell contains the mean value of the 10 m wind speed from GWA2 computed over the period 2006-2018. This file is used in the bias adjustment procedure described in Section 2.3.3.	Climatology of ERA5 100 m wind speed

4.2. Metadata

The header of CSV files contains the following metadata descriptors. Below, an example is presented for the 2m air temperature variable:

# General

## Title

### Air Temperature

## Abstract

### ERA5

## Date

### 2022-11-28

## Date type

### Publication: Date identifies when the data was issued

## Unit

### C

## URL

### https://cds.climate.copernicus.eu/

## Data format

### CSV

## Keywords

### ERA5: reanalysis : Copernicus : C3S : C3S Energy : ICS

## Point of contact

### Individual name

#### Alberto Troccoli

### Electronic mail address

#### info@inclimateservice.com

### Organisation name

#### Inside Climate Service

### Role

#### Owner: Party that owns the resource

# Usage

## Access constraints

### Intellectual property rights: The IP of these data belongs to the EU Copernicus programme

## Use constraints

### Creative Commons

## Citation(s)

### Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horanyi, A.; Munoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 global reanalysis. Q. J. Roy. Meteor. Soc. 2020, 146, 1999-2049, doi:10.1002/qj.3803

## Temporal extent

## Begin date

### 1980-01-01-0000

## End date

### 2021-12-31-2300

## Temporal resolution

### 1 hour

## Geographic bounding box

### westBoundLongitude -31.00

### eastBoundLongitude 45.00

### southBoundLatitude 18.00

### northBoundLatitude 75.00

## Spatial resolution

### NUTS0

# Lineage Statement

## Original Data Source

## Statement

### The original data sources are ECMWF ERA5 Reanalysis (available at: https://cds.climate.copernicus.eu)

How to cite the data*

Please refer to the "References" section on the catalogue entry page of this dataset in the Climate Data Store (CDS) as it provides the DOI number as well as details on dataset citation and attribution.

References

Beyer, H. G., Heilscher, G., and Bofinger, S.: "A robust model for the MPP performance of different types of PV-modules applied for the performance check of grid connected systems”, EuroSun 2004 conference; pp. 3064-3071, Germany, June 2004.

Blanc, P., and Wald, L.: "The SG2 algorithm for a fast and accurate computation of the position of the Sun for multi-decadal time period”, Solar Energy; vol. 86, pp. 3072-3083, https://doi.org/10.1016/j.solener.2012.07.018, 2012.

Davidson, M. R., and Millstein, D.: "Limitations of reanalysis data for wind power applications", Wind Energy; Volume: 25, Issue: 9. Pages: 1646-1653, https://doi.org/10.1002/we.2759, 2022.

Davis, N. N., Badger, J., Hahmann, . N., Hansen, B. O., Mortensen, N. G., Kelly, M., et al.: "The Global Wind Atlas: A High-Resolution Dataset of Climatologies and Associated Web-Based Application", Bulletin of the American Meteorological Society; Volume 104, Number 8, DOI: 10.1175/BAMS-D-21-0075.1, 2023.

de Baar, J. H. S., van der Schrier, G., van den Besselaar, E. J. M., Garcia-Marti, I., and de Valk, C.: "A new E-OBS gridded dataset for daily mean wind speed over Europe", International Journal of Climatology; 43(13), 6083–6100, https://doi.org/10.1002/joc.8191, 2023.

Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E.: "Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization", Geosci. Model Dev.; 9, 1937–1958, https://doi.org/10.5194/gmd-9-1937-2016, 2016.

Floors, R. R., and Nielsen, M.: "Estimating air density using observations and re-analysis outputs for wind energy purposes". Energies; 12, 2038, https://doi.org/10.3390/en12112038, 2019.

Gueymard, C. A., Lara-Fanego, V., Sengupta, M., and Xie, Y.: “Surface albedo and reflectance: Review of definitions, angular and spectral effects, and intercomparison of major data sources in support of advanced solar irradiance modeling over the Americas”, Solar Energy; vol. 182, pp. 194–212, https://doi.org/10.1016/j.solener.2019.02.040, 2019.

Ho, L.T.T., Dubus, L., De Felice, M., and Troccoli, A.: "A. Reconstruction of Multidecadal Country-Aggregated Hydro Power Generation in Europe Based on a Random Forest Model", Energies; 13, 1786, https://doi.org/10.3390/en13071786, 2020.

Klucher, T. M., “Evaluation of models to predict insolation on tilted surfaces”, Solar Energy; vol. 23, pp. 111–114, https://doi.org/10.1016/0038-092X(79)90110-5, 1979.

Koivisto M., K. Plakas, E. R. Hurtado Ellmann, N. Davis, and P. Sørensen: “Application of microscale wind and detailed wind power plant data in large-scale wind generation simulations”, Electric Power Systems Research; vol. 190, 106638, https://doi.org/10.1016/j.epsr.2020.106638, 2021.

Liu, B., and Jordan, R.: “Daily insolation on surfaces tilted towards equator”, ASHRAE Transactions; 10, pp. 526–541, 1961.

Martin, N., and Ruiz, J. M.: “Calculation of the PV modules angular losses under field conditions by means of an analytical model”, Solar Energy Materials and Solar Cells; vol. 70, pp. 25–38, https://doi.org/10.1016/S0927-0248(00)00408-6, 2001.

Martin, N., and Ruiz, J. M.: "Corrigendum to “Calculation of the PV modules angular losses under field conditions by means of an analytical model”, Solar Energy Materials and Solar Cells; vol. 110, pp. 154, 2013.

Michelangeli, P.-A., Vrac, M., and Loukos, H.: "Probabilistic downscaling approaches: Application to wind cumulative distribution functions", Geophys. Res. Lett., 36, L11708, doi:10.1029/2009GL038401, 2009.

Mortensen N. G.: “Wind resource assessment using the WAsP software”, WindEurope DTU report, https://backend.orbit.dtu.dk/ws/portalfiles/portal/164389714/Wind_resource_assessment_using_the_WAsP_software_DTU_Wind_Energy_E_0174_.pdf, 2018.

Murcia Leon J. P., M. J. Koivisto, P. Sørensen, and P. Magnant: “Power Fluctuations In High Installation Density Offshore Wind Fleets”, Wind Energy Science; vol. 6, pp. 461–476, https://doi.org/10.5194/wes-6-461-2021, 2021.

Murcia J. P., M. J. Koivisto, G. Luzia, B. T. Olsen, A. N. Hahmann, P. E. Sørensen, and M. Als: “Validation of European-scale simulated wind speed and wind generation time series”, Applied Energy; vol. 305, 117794, https://doi.org/10.1016/j.apenergy.2021.117794, 2022.

Navarro-Racines, C., Tarapues, J., Thornton, P. et al.: "High-resolution and bias-corrected CMIP5 projections for climate change impact assessments", Sci Data; 7, 7. https://doi.org/10.1038/s41597-019-0343-8, 2020.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J.: "Scikit-learn: Machine learning in Python". The Journal of Machine Learning Research; 12, pp.2825-2830, 2011.

pvlib, python v0.9.3 documentation, Klucher irradiance transposition function. Retrieved November 30, 2022, from https://pvlib-python.readthedocs.io/en/stable/reference/generated/pvlib.irradiance.klucher.html, n.d.

pvlib, python v0.9.3 documentation, ground diffuse irradiance function. Retrieved November 30, 2022, from https://pvlib-python.readthedocs.io/en/stable/reference/generated/pvlib.irradiance.klucher.html, n.d.

PyPi sg2 package entry. Retrieved November 30, 2022, from https://pypi.org/project/sg2/, n.d.

Ross, R.G.: “Interface design considerations for terrestrial solar cell modules” in Photovoltaic Specialists Conference Record, pp. 801-806, United States of America, 1976.

Saint-Drenan, Y.-M., Wald, L., Ranchin, T., Dubus, L., and Troccoli, A.: "An approach for the estimation of the aggregated photovoltaic power generated in several European countries from meteorological data", Adv. Sci. Res.; 15, 51–62, https://doi.org/10.5194/asr-15-51-2018, 2018.

Schmidt, H., and Sauer, D. U.: ‘‘Practical modeling and estimation of inverter efficiencies,’’ 9th Internationales Sonnenforum, pp. 550–557, Germany, 1994.

Simutis E., et al.: “Enhancing Wind Farm Generation Modeling with Turbulence Intensity and Time-Varying Air Density for Large-Scale Energy System Studies”, Wind & Solar Integration Workshop 2024, Helsinki, https://orbit.dtu.dk/en/publications/enhancing-wind-farm-generation-modeling-with-turbulence-intensity, October 2024.

Skartveit, A., and Olseth, J. A.: “A model for the diffuse fraction of hourly global radiation”, Solar Energy; vol. 38, pp. 271-274, https://doi.org/10.1016/0038-092X(87)90049-1, 1987.

Skoplaki E., et al.: “Operating temperature of photovoltaic modules: A survey of pertinent correlations”, Renewable Energy; vol. 34, pp. 23-29, https://doi.org/10.1016/j.renene.2008.04.009https://doi.org/10.1016/j.renene.2008.04.009, 2009.

Swisher P., J. P. Murcia Leon, J. Gea-Bermúdez, M. Koivisto, H. Madsen, and M. Münster: “Competitiveness of a low specific power, low cut-out wind speed wind turbine in North and Central Europe towards 2050”, Applied Energy; vol. 306, part B, 118043, https://doi.org/10.1016/j.apenergy.2021.118043, 2022.

Temps, R. C., and Coulson, K. L.: “Solar radiation incident upon slopes of different orientations”, Solar Energy; vol. 19, pp. 179–184, https://doi.org/10.1016/0038-092X(77)90056-1, 1977.

Williams, S. R., Betts, T. R., Helf, T., Gottschalg, R., Beyer, H. G., and Infield, D. G., “Modelling long-term module performance based on realistic reporting conditions with consideration to spectral effects”, 3rd World Conference on Photovoltaic Energy Conversion; vol. 2, pp. 1908-1911, 2003.

_{This document has been produced in the context of the Copernicus Climate Change Service (C3S).}

_{The activities leading to these results have been contracted by the European Centre for Medium-Range Weather Forecasts, operator of C3S on behalf of the European Union (Delegation Agreement signed on 11/11/2014 and Contribution Agreement signed on 22/07/2021). All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose.}

_{The users thereof use the information at their sole risk and liability. For the avoidance of all doubt, the European Commission and the European Centre for Medium - Range Weather Forecasts have no liability in respect of this document, which is merely representing the author's view.}

Space shortcuts

Page tree

Climate and energy related variables from the Pan-European Climate Database derived from reanalysis and climate projections v4.2: Product user guide (PUG)