Last modified on Sept 08, 2024 16:05

Contributors: A. Troccoli (ICS), M. Borga (ICS/UNIPD), M. Zaramella (ICS), L. Lusito (ICS), S. Cordeddu (ICS), E. Restivo (ICS), G. Aldrigo (ICS), S. Campostrini (ICS), S. Strada (ICS), Y-M. Saint-Drenan (ARMINES), R. Amaro e Silva (ARMINES), M. Koivisto (DTU), B. Olsen (DTU), P. Kanellas (DTU).

Table of Contents

History of modifications

Issue	Date	Description of modification	Author
v1.0	22/08/2024	Final version	C3S

List of datasets covered by this document

Deliverable ID	Product title	Product type (CDR, ICDR)	C3S Version Number	Public Version Number	Delivery date
	Climate and energy related variables from the Pan-European Climate Database derived from reanalysis and climate projections	CDR	v1.0	v1.0	22/08/2024

Acronyms and abbreviations

Acronym/abbreviation	Definition
AM	Annual Maxima
AOI	Angle Of Incidence
API	Application Programming Interface
AR6	Sixth Assessment Report
ASCII	American Standard Code for Information Interchange
BHI	Beam Horizontal Irradiance
BIAS	Data that have been bias-adjusted
C3S	Copernicus Climate Change Service
CDFt	Cumulative Distribution Function transfer
CDO	Climate Data Operators
CDS	Climate Data Store
CMIP6	Coupled Model Intercomparison Project (sixth phase)
CMR5	CMCC-CM2-SR5
CSP	Concentrated Solar Power
DHI	Diffuse Horizontal Irradiance
DMP	Data Management Plan
DNI	Direct Normal Irradiance
DTU	Technical University of Denmark
ECE3	EC-Earth3
ENTSO-E	European Network of Transmission System Operators for Electricity
ERAA	European Resource Adequacy Assessment
ESFG	Earth System Grid Federation
ESRI	Environmental Systems Research Institute
GCM	Global Climate Model
GHI	Surface solar radiation downwards
GMC	General Climate Model
GPU	Generation Per Unit
GTI	Global Tilted Irradiance
HOL	Hydropower open-loop pumped storage inflow energy
HP	Hydro Power
HPI	Hydropower run-of-river with pondage inflow energy
HPO	Hydropower run-of-river with pondage generation energy
HPS	Hydro Pumped Storage
HRG	Hydropower reservoirs generation energy
HRI	Hydropower reservoirs inflow energy
HRO	Hydropower run-of-river generation energy
HRR	Hydropower run-of-river inflow energy
HWS	High Wind Speed
IC	Installed Capacity
IPCC	Intergovernmental Panel on Climate Change
LOYO	Leave-One-Year-Out
MAE	Mean Absolute Error
MEHR	MPI-ESM1-2-HR
NDA	Non-Disclosure Agreement
nMAD	normalized Mean Absolute Deviation
nMBD	normalized Mean Bias Deviation
NNSE	Normalized Nash-Sutcliffe Efficiency
NSE	Nash-Sutcliffe Efficiency
NUT0	Country level of aggregation
NUT2	Sub Country/Provinces level of aggregation
ORIG	Data that have not been bias-adjusted
PECD	Pan-European Climate Database
PEOF	Pan-European Bidding Zones Offshore level of aggregation
PEON	Pan-European Bidding Zones Onshore level of aggregation
POA	Plane Of Array
PV	Photo Voltaic
QGIS	Quantum Geographic Information System
RF	Random Forest
SEDAC	Socioeconomic Data and Applications Center
SFOE	Swiss Federal Office of Energy
SPV	Solar Photovoltaic
SSPs	Shared Socio-economic Pathways
SZA	Solar Zenith Angle
SZOF	Pan-European Zones Offshore level of aggregation
SZON	Pan-European Zones Onshore level of aggregation
TA	2m temperature
TAW	Population-weighted temperature
TOA	Top Of the Atmosphere
TP	Total precipitation
TSO	Transmission System Operator
UTC	Coordinated Universal Time
VM	Virtual Machine
WMO	World Meteorological Organization
WOF	Wind power offshore
WON	Wind power onshore
WPP	Wind Power Plant
WS10	10m wind speed
WS100	100m wind speed

Introduction

This document describes the technical methodologies and implementation of the climate and energy indicators underpinning the Pan-European Climate Database (PECD), co-developed within the Copernicus Climate Change Service (C3S) Energy service in close collaboration with the European Network of Transmission System Operators for Electricity (ENTSO-E).

Technical documentation of the workflows and their modules is provided below. The workflows are structured according to the two streams covered by PECD: historical and projections. Each routine is described in terms of its inputs, outputs, and processing steps. There is no operational chain in place for near real-time updates of the datasets. However, regular updates of historical data will be performed annually.

In the PECDv4.1, only three Global Climate Models (GCMs) were used. However, it is important to note that using a larger set of models is essential for adequately capturing the uncertainty inherent in climate projections. Future versions, such as PECDv4.2, will incorporate a wider range of models and scenarios to improve the representation of uncertainties.

A detailed description of the filenames of the provided data is available in the Appendix.

Files will be provided in two format types: NetCDF and CSV. Please refer to Table 2.2, Table 2.13, Table 3.5 and Table 3.7 for more info on the file format of each variable.

Please note that PECDv4.1 data will not be extended beyond the year 2021, as these datasets have been frozen prior to the start of the ERAA (European Resource Adequacy Assessment) studies 2023, in agreement with ENTSO-E. By the end of 2024, a new version, PECDv4.2, will be delivered, containing historical data from 1950 to the near present. The historical data for PECDv4.2 will be updated annually. These updates will add new data without modifying the existing datasets, thus maintaining the same version number.

The plan agreed with ENTSO-E is to have PECDv4.2 available in 2025. Future versions will include more climate models, emission scenarios, extended time series, and changes in methodologies, aggregation zones, and other aspects according to ENTSO-E requirements.

Workflows

The workflows serve as the backbone, integrating all components of the chain. We have developed one for each stream – historical (Figure 1.1) and projections (Figure 1.2). Each workflow covers both climate and energy indicators, which are used as reference for service production and monitoring, and are detailed in the following sections.

The workflows consist of several sequential steps. The initial step involves the processing of climate data, which typically includes bias adjustment by aligning models to reanalyses or observations, re-gridding, and potentially selecting model outputs (for projections). The second step involves calculating energy indicators using the processed climate data along with energy data and conversion models developed as part of the C3S Energy service. These models include wind power, solar photovoltaic (including concentrated solar power), and a statistical model for computing various hydrological indicators.

Figure 1.1: Workflow for the historical stream.

Figure 1.2: Workflow for the projection stream.

Historical stream

Data retrieval

The workflow depicting the historical stream is outlined in Figure 1.1. The retrieval of ERA5 data from the Climate Data Store (CDS) to C3S is accomplished using the CDS API (Application Programming Interface), which requires prior installation of Python and the CDS API Python package. Data is retrieved by specifying the required period and variables to be downloaded. Currently, retrievals are performed in monthly chunks. Each variable has been downloaded at a 1-hour resolution for the period 1980 to 2021, within the designated study region known as the PECD domain.

The PECD domain/grid is a standardized latitude/longitude grid with spatial resolution of 0.25° x 0.25°, covering the geographic extents of 18°N to 75°N and 31°W to 45°E. This includes regions such as Europe, North Africa, and Northwest Asia.

The climate indicators available within the PECD include 2m temperature, population-weighted temperature, total precipitation, surface solar radiation downwards, 10m wind speed and 100m wind speed. Detailed descriptions of these indicators can be found in Section 2.6. Notably, surface solar radiation downwards is downloaded as hourly data in J m^-2 and converted to W m^-2 by dividing by the 3600 seconds in an hour. For wind speed at 10 m, the two horizontal components \( u_{10} \) and \( v_{10} \) are dowloaded from the CDS and merged into \( ws_{10} \) using the formula:

\[ ws_{10} = \sqrt{u_{10}^2 + v_{10}^2} \]

Power law for wind profile scaling

Wind speed data from numerical weather prediction and climate models typically provide information at limited vertical heights, often at 10 m above the ground level. For instance, the CMIP6 (Coupled Model Intercomparison Project, sixth phase) projections include only near-surface wind speed measurements. To extrapolate wind speed information to different heights above the ground, such as 100 m, a vertical wind shear coefficient, referred to here as the Alpha coefficient, can be calculated using a power law. This coefficient allows for estimating the vertical wind profile, which is useful for applications like wind turbine modelling and renewable energy estimation within climate modelling.

By utilizing the Alpha coefficient, wind speed data at 10 m can be converted into values for other heights, like 100 m above ground level. This involves computing a location-specific coefficient to adjust for local features represented in the model (e.g., ERA5). Additionally, temporal variations are taken into account by stratifying the Alpha coefficient based on factors such as time of day and month.

To maintain consistency between projection and historical data, wind speed at 100 m will also be provided for the historical stream by computing the power law extrapolation from ERA5 wind speed data at 10 m.

Alpha computation

The zonal and meridional components of ERA5 wind speed at 10 m and 100 m were downloaded from the C3S Climate Data Store (CDS).

The temporal range spans the 11-year period from 2011 to 2021, with data provided as instantaneous values at hourly resolution. This period is the most recent, which means that the ERA5 reanalysis is likely based on better observations. Additionally, 11 years is considered a long enough period to compute representative and stable statistics for the Alpha coefficient.

To retrieve the data, the CDS API was run in Python. From the two wind speed components, the overall wind speed was computed.

Then, the Alpha coefficient (dimensionless), was obtained for each individual grid point using the wind speed at the 10 m and 100 m heights, as follows:

\[ \alpha = \dfrac{\ln{v_{2}} - \ln{v_{1}}}{\ln{h_{2}} - \ln{h_{1}}} \]

where \( v_{2} \) and \( v_{1} \) [m s^-1] correspond to the wind speed at 100 m and 10 m respectively; \( h_2 \) and \( h_1 \) [m] correspond to these two respective heights. The wind shear coefficient is computed at an hourly resolution and stratified by the twelve months of the year (24 hours x 12 months) and it was calculated using Python's Climate Data Operators (CDO) commands.

Alpha characterization

The mean value of the Alpha coefficient calculated over the geographical domain for each hour and month is represented in Figure 2.1. These results are consistent with previous studies, showing higher coefficient values during the cold and stable hours of the night. Conversely, during the day, when the boundary layer is generally well mixed, the Alpha coefficient is lower. For the same reason, the values of the Alpha coefficient are higher in winter compared to summer during the central (and warmer) hours of the day. However, when examining the distribution of the Alpha coefficient across each grid point and month of the year, a more complex picture emerges. Figure 2.2 shows the box plot for each hour over the entire domain, indicating that the interquartile range is broader for the night-time hours compared to the day-time hours. The Alpha coefficient reaches its most negative values, down to -0.4, during the night-time hours.

The power law used in the present study and described above represents an improvement over the standard approach, which considers Alpha as a constant value, independent of space and time.

Figure 2.1: Alpha's mean value over the whole domain as a function of time of day (in UTC) and by month.

Figure 2.2: Alpha's box-and-whisker diagram as a function of time of day (in UTC), considering all grid points in the domain. The boxes indicate the interquartile range, namely from 25% to 75%.

Wind speed bias adjustment for PECDv4.1

Bias adjustment is the collective term for the process of reducing biases present in climate models during a post-processing phase. It becomes essential when climate data from climate models are used as driving data for application models. This is particularly crucial for wind speed, a key variable for computing wind power, as its computation depends nonlinearly on the wind speed (specifically on its cube). Therefore, significant biases in wind speed can markedly affect the wind energy indicator.

Adjusting the wind speed for the historical period was identified by stakeholders (ENTSO-E and its members) as the main challenge for computing the PECDv4.1 energy indicators.

The technical methodology and implementation of the bias adjustment of ERA5 wind speed at 10 m height chosen for PECDv4.1 is the CDFt method. The CDFt method (Cumulative Distribution Function transfer) assumes the existence of a transformation allowing the translation of a time series of a General Circulation Model (GCM) variable into the Cumulative Distribution Function representing the local-scale climate variable at a given point. The bias adjustment was applied to the wind speed data for the historical stream, mapping the 'modelled' ERA5 wind from its raw Cumulative Distribution Function (ORIG) to the Cumulative Distribution Function of the COSMO-REA6 dataset (referred to as COSMO for brevity), which was selected as the reference dataset. This adjustment process involved sampling time-series values at each grid point. It was conducted for each month of data at a daily resolution. Subsequently, a correction was applied to restore the bias-adjusted wind speeds from daily to the final hourly resolution, taking into account the raw ERA5 diurnal cycle.

Preprocessing

The ERA5 horizontal components of wind speed at 10 m, \( u_{10} \) and \( v_{10} \) , were retrieved from the CDS according to Section 2.1, in the PECD domain at a spatial resolution of 0.25° x 0.25° and hourly temporal resolution. The wind speed at 10 m, \( ws_{10} \) , is computed via Python scripts using the simple formula:

\[ ws_{10} = \sqrt{u_{10}^2 + v_{10}^2} \]

From the COSMO_REA6 Regional Reanalysis, the same components were retrieved: \( u_{10} \) ¹ and \( v_{10} \) ² . These components have a spatial resolution of 0.11° x 0.11°, hourly temporal resolution, and cover a domain that partially intersects with the PECD domain. Wind speed at 10 m was calculated from the two components using CDO and remapped to the PECD domain at 0.25° resolution using the CDO remap command (i.e., through a bilinear interpolation). This operation returns NaN values outside the COSMO domain.

A comparison between the two datasets is shown in Figure 2.3. The figure illustrates that ERA5 tends to underestimate the intensity of wind speed in most land areas in Europe, except in the North East, while it overestimates it over the sea, particularly in the North Sea. Some large overestimations are observed along certain coastlines, such as Southern Norway or Portugal.

Figure 2.3: Top left: COSMO mean wind speed at 10 m over 10 years of hourly data (2000-2009). Top right: ERA5 mean wind speed at 10 m over 10 years of hourly data (2000 to 2009). Bottom: Difference between ERA5 and COSMO for wind speeds at 10 m - averaged over 2000-2009.

Quality check of the dataset

A quality check of the datasets and COSMO was performed before applying the CDFt method using four control boxes, which are shown in Figure 2.4.

Figure 2.4 The location of the control boxes used for the CDFt methodology assessment. Blue: France (latitude: 45-47°N; longitude 5-8°E); magenta: Germany (latitude: 50-53°N; longitude: 6-10°E); orange: Sweden (latitude: 57-61°N; longitude: 13-16°E); green: Finland (latitude: 60.5-63.5°N, longitude: 22.5-26.5°E).

A known bug of the ERA5 re-analysis³ regards a drop in wind speed at 10:00 UTC. The issue has been fixed by re-computing the 10:00 UTC value by linearly interpolating at grid point level the 9:00 UTC and the 11:00 UTC values (equivalent to a temporal average) by means of the function 'interp’ (method = ‘linear’) of the Xarray Python library. Figure 2.5 shows the diurnal cycle of a subset (2009-2018) of the resulting corrected temporal timeseries, in each of the four geographical control boxes in Figure 2.4.

Figure 2.5: Effect of the correction of the 10:00 UTC drop in the WS10 in each of the four geographical control boxes: blue: original dataset; orange: corrected dataset.

Features like amplitude and maximum of the diurnal cycle of wind intensity are important for the energy sector, particularly to assess the wind power production and the transport network load. The daily behavior of the ORIG wind speed dataset has been analyzed to compare the climate diurnal cycle to the diurnal cycle of both COSMO and of wind power generation. Figure 2.6 shows the diurnal cycle of ERA5 \( ws_{10} \) (ORIG) compared to COSMO \( ws_{10} \) (COSMO) for the four geographical control boxes of Figure 2.4, covering the years from 2009 to 2018. The results indicate that while the diurnal cycles in the two datasets have similar dependencies on the hour of the day, the ORIG and COSMO diurnal cycles exhibit different absolute values.

Figure 2.6: Diurnal cycle of WS10 from ERA5 original dataset (blue) and from ERA5 bias-corrected (CDFt method) using COSMO as reference (orange) over the four control boxes and years 2009-2018.

Subsequently, the wind speeds at 100 m height were analyzed. Since the wind at 100 m above the ground is not available for the CMIP6 projection models, and to maintain consistency between the wind speed at 100 m in historical (ERA5) and projection datasets, the wind speed at 100 m is extrapolated from the surface wind speed using the Alpha Coefficient (or power law). The procedure for this extrapolation is detailed in Section 2.2. Figure 2.7 shows the diurnal cycle of ERA5 \( ws_{100} \) (computed from \( ws_{10} \) , orange) vs \( ws_{100} \) from raw ERA5 (blue) for the four geographical control boxes of Figure 2.4, covering years from 2009 to 2018.
Figure 2.6 and Figure 2.8 indicate that the diurnal cycle of COSMO differs significantly compared to ERA5: at 10 m elevation, the two datasets have a similar shape but different values, while at 100 m even the shapes are different. This difference in wind is important because it impacts the capacity factor (CF) of wind energy production. The difference in shapes observed at 100 meters cannot be explained by the power law procedure. Further analysis reveals that, contrary to the initial literature review, the diurnal cycle of COSMO does not seem to be as accurate, and in fact, the diurnal cycle of ERA5 is closer to the wind power data.

Figure 2.7: Diurnal cycle of WS100 over the four control boxes, years 2009-2018, ERA5 ORIG (blue) vs WS100 obtained from the power law applied to the ERA5 WS10 (orange).

Figure 2.8: Diurnal cycle of WS100 calculated through alpha coefficients from ERA5 original dataset (blue) and from ERA5 bias-corrected (CDFt method) using COSMO as reference (orange) over the four control boxes and years 2009-2018.

Bias-adjustment procedure

COSMO can be considered the best reference dataset for wind speed bias adjustment because it allows correction of the general underestimation of wind speed typically observed in ERA5 \( ws_{10} \) (e.g. Jourdier 2020). However, from the quality check above, we can see that COSMO shows some issues in the representation of the wind speed diurnal cycle, which affects the computed wind energy production. Therefore, the bias adjustment procedure for the ERA5 historical stream has been revised to specifically preserve the diurnal cycle of the original ERA5 \( ws_{10} \)

The adopted bias adjustment procedure, applied only to WS10, comprises three steps detailed below (see also Figure 2.9 and 2.10). Once the WS10 is bias-adjusted with this procedure, the WS100 is extrapolated by applying the alpha coefficients (Section 2.2).

1) Pre-processing: The ORIG \( ws_{10} \) at hourly resolution is used to calculate daily averages and the diurnal cycle. The diurnal cycle is computed as the difference between hourly data and the daily average for each day of the time series to be adjusted and each grid point of the PECD domain.

2) Bias adjustment: The ERA5 \( ws_{10} \) daily averages calculated in step (1) are adjusted with the CDFt methodology, taking the COSMO daily averages as a reference.

3) Post-processing: The original ERA5 diurnal cycle is inserted into the bias-adjusted ERA5 \( ws_{10} \) daily values to obtain the hourly bias-adjusted \( ws_{10} \) data.

Figure 2.9: Bias adjustment logic blocks for ERA5 historical wind speed (WS).

Figure 2.10: Detail of the bias adjustment logic block for ERA5 historical wind speed (WS).

ERA5: data documentation#Knownissues, point 8.

Population-weighted temperature

Population-weighted temperature (TAW) is a crucial indicator used in many energy conversion and energy demand models, and is therefore one of the climate indicators stored in the PECD database. This parameter represents the temperature weighted by the population across a region, where the weight of each grid cell is calculated as the ratio of the population in that grid cell to the total population of the reference region. In the PECD framework, only SZON regions are considered for this variable (please refer to Table 2.1 for the different levels of aggregation and their acronyms).

Understanding population-weighted temperature is particularly important for the energy sector because it provides a more accurate measure of the temperature conditions experienced by the majority of the population. This has significant implications for energy demand, as temperature extremes can lead to increased use of heating and cooling systems. By incorporating TAW into energy models, analysts and policymakers can better predict and manage energy consumption patterns, optimize energy distribution, and plan for future infrastructure needs, ensuring that energy supply meets demand efficiently and sustainably.

Population mask

To calculate the TAW variable, gridded population values covering the PECD domain are required. For this purpose, raster population data at a 0.25° resolution were downloaded from the NASA Socioeconomic Data and Applications Center (SEDAC) in ASCII (American Standard Code for Information Interchange) file format, using the most recent dataset updated in 2020. Raw population data are available from the SEDAC website⁴ .

The raster was clipped to the PECD domain extent and converted to NetCDF format (Figure 2.11), using QGIS-GRASS GIS (Geographic Information System, Open-Source Geospatial Foundation Project⁵ ). Raster values represent the number of inhabitants per cell, with sea/ocean pixels assigned to no data values according to the ESRI (Environmental Systems Research Institute) ASCII format. The path to the population mask is ‘/data/public/PECD/ANCI/POPM’.

Figure 2.11: Population map from NASA Socioeconomic Data and Applications Center. The map is reported at 0.25° resolution and represents the number of inhabitants within each cell.

Computation of Population-weighted temperature

Population-weighted temperature TAW [°C] is computed by combining the population raster at 0.25° resolution, which reports the inhabitants per cell, with the gridded temperature TA at the same resolution and over the same domain. TAW_z of a zone z is calculated according to the equation:

\[ TAW_z = \frac{\sum_{i=1}^n T_iP_i}{\sum_{i=0}^n P_i} \]

where \( T_i \) is the temperature and \( P_i \) the population of the i-th cell, within the given zone. The weighted spatial average was computed for the SZON mask (please refer to Table 2.1). Figure 2.12 shows the average of the population-weighted temperature for the period from 1980 to 2021 considering the SZON level of aggregation.

Figure 2.12: TA (top) and TAW (bottom) averaged 1980 to 2021 over bidding zones (SZON).

Spatial aggregation

Spatial aggregation is the process used to derive country or sub-country averaged variables from gridded indicators. The following paragraphs describe the different levels of aggregation and outline the spatial aggregation procedure. The spatial aggregation is applied to each gridded climate and energy indicator to obtain the corresponding aggregated indicators.

Please note that when downloading regional aggregated timeseries, the widget does not allow for sub-region selection. Sub-region extraction is only available for gridded data.

Required spatial aggregation level for PECDv4.1

Various aggregation levels are necessary for the PECD database, as detailed in Table 2.1 below. For countries and sub-country/provinces, the integration involved the utilization of NUTS (Nomenclature des Unités Territoriales Statistiques⁶ ) shapefiles alongside the ADMIN Natural Earth Global Administrative Zones⁷ shapefiles. For bidding zones⁸ and other sets of regions specific for PECD analysis, official shapefiles were furnished from ENTSO-E. Figure 2.13 shows an example of a shapefile used to create the mask.

Table 2.1: Required spatial aggregation for PECDv4.1.

Code	Description of the aggregation level	Source
ORIG	Not aggregated	Gridded data
BIAS	Not aggregated	Gridded data bias adjusted (CDFt method see Section 2.5)
NUT0	Country	NUTS0+ADMIN0
NUT2	Sub Country/Provinces	NUTS2+ADMIN1
SZON	Onshore Bidding Zones	Shapefile provided by ENTSO-E
SZOF	Offshore Bidding Zones	Shapefile provided by ENTSO-E
PEON	Pan-European Onshore Zones	Shapefile provided by ENTSO-E
PEOF	Pan-European Offshore Zones	Shapefile provided by ENTSO-E

Figure 2.13: Examples of the original polygons used to derive the float masks.

https://sedac.ciesin.columbia.edu/
http://qgis.osgeo.org
https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics
https://www.naturalearthdata.com
Bidding Zones: Bidding zones are geographical areas within a power market where electricity is traded without internal transmission constraints, ensuring uniform pricing across the zone. These zones facilitate efficient market operations and price formation by balancing supply and demand, providing clear investment signals, and supporting cross-border electricity trade. Defined by transmission system operators, bidding zones help optimize the use of the transmission network and maintain market transparency.

Mask

Starting from the shapefiles listed in Table 2.1, floating point NetCDF masks were built to be used for data aggregation purposes. The procedure requires different steps:

Simplification of shape files with tolerance for angular distance set to 0.1°.
Load a raster with the PECD domain and ERA5 spatial resolution of 0.25°x0.25°.
Iterate over polygons in the shape file and for each polygon:
- Identify all grid cells in the raster that intersect with the polygon.
- Iterate over the selected grid cells:
- Compute the percentage of area occupied within the cells by dividing the area of intersection between the region of interest and the grid cell area.
- Assign the value obtained in the previous step to each grid cell.
Save the result of the previous calculation in a NetCDF file.

Figure 2.14 shows an example of a country mask (Italy).

Figure 2.14: Example of a float mask, for the Italian NUT0 administrative region, showing the fractions of land around the border and coastlines.

The NetCDF mask file will be structured as follows:

coordinates: latitude (PECD domain), longitude (PECD domain), mask (mask code of each polygon).

Spatial aggregation procedure

The spatial aggregation procedure is executed by a Python tool, following this script flow:

Open the NetCDF file containing the data to be aggregated.
Open the precalculated mask NetCDF file.
Iterate over mask coordinates.
For each mask, apply the mask (product of data array) to the NetCDF to be aggregated, weighted by the cosine of latitude.
Calculate the average over the masked NetCDF.
Store the result in a column of a data frame.
Store the time axis of the NetCDF file in the same data frame of the aggregated result.
Save the dataframe as csv file.
Apply metadata to the CSV file according to the annex.

Climate indicators

The PECD gridded and aggregated climate indicators for the historical stream are listed in Table 2.2. Specifically, Table 2.2 provides details for each variable, including the period covered, the source of the input data, the domain and spatial resolution, the temporal resolution, the spatial aggregation (as outlined in Table 2.1), and the units.

Table 2.2: Climate indicators provided in the PECDv4.1 for the historical stream. Files provided at ORIG spatial aggregation are gridded (NetCDF format), while all the other levels of aggregation are provided in CSV format.

Variable	Period	Source	Domain/ spatial resolution	Temporal resolution	Spatial aggregation	Units
2m temperature (TA)	1980 - 2021	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, SZOF, PEON, PEOF	K (gridded) °C (aggregated)
Population-weighted temperature (TAW)	1980 - 2021	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	SZON	°C
Total precipitation (TP)	1980 - 2021	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m
Surface solar radiation downwards (GHI)	1980 - 2021	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, SZOF, PEON, PEOF	W m^-2
10m wind speed (WS10)	1980 - 2021	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m s^-1
100m wind speed (WS100)	1980 - 2021	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m s^-1

Energy data

In collaboration with the ENTSO-E team, work has been carried out to gather the possible maximum amount of energy-related data for the validation of energy models and the training of the statistical model (Hydro model). The following data have been used for this purpose:

1) Position and characteristics of power towers in the PECD domain from The Wind Power⁹

2) Turbine-level power curves are based on the generic power curve model in PyWake¹⁰

3) ENTSO-E timeseries for solar and Hydro generation data from the ENTSO-E Transparency Platform¹¹

4) Inflow data from previous PECDv3.1

5) TSOs inflow data. These data are covered by a specific non-disclosure agreement (NDA) and are not detailed here.

Exclusion areas

Several 'exclusion layers' have been applied to the PECD calculation to ensure more accurate assessments of wind and solar energy potential. These layers take into account various restrictive criteria to reflect realistic conditions for energy production. The energy indicators have been calculated and spatially aggregated while considering the following exclusion criteria:

Protected Areas
Polar Areas
Urban areas
Water and continent areas
High slope areas
High-elevation areas (with elevation greater than 2000 m above sea level)
Distance to shore areas

Combinations of restricted areas were also considered for wind generation and PV modeling. The exclusion area masks/files have been created based on these criteria to accurately represent regions where energy production is not feasible or allowed. For the distance to shore areas, the exclusion layer was generated using the QGIS Buffer tool to create a distance buffer of 100 km in both directions from the continental edge, except for the North Sea, where a buffer of 200 nautical miles has been retained as per ENTSO-E’s specifications.

Table 2.3 provides a detailed description of each exclusion criterion, their sources, and the variables associated with them.

Table 2.3: Description of exclusion areas.

Criteria	Description	Source	Variable Name
Protected areas	Dataset with the constraint for protected areas. Gridded data at 0.25°x0.25° spatial resolution over the globe domain, with a binary format, where 1 represents a restricted pixel under this specific criterion.	World database on protected areas from the United Nations Environment Programme	prot_a
Polar areas	Dataset with the constraint for polar areas. Gridded data at 0.25°x0.25° spatial resolution over the globe domain, with a binary format, where 1 represents a restricted pixel under this specific criterion.	Land cover classification system from the United Nations Food and Agriculture Organization	polar_a
Urban areas	Dataset with the constraint for urban areas. Gridded data at 0.25°x0.25° spatial resolution over the globe domain, with a binary format, where 1 represents a restricted grid cell with an urban coverage equal or higher than 45%.	Land cover classification system from the United Nations Food and Agriculture Organization	urban_a
Water and continental waters area	Dataset with the constraint for inland water areas. Gridded data at 0.25°x0.25° spatial resolution over the globe domain, with a three-value format, where 0 represents land, 1 represents ocean and 2 corresponds to inland waters.	ERA5 land-sea mask from ECMWF	watr_a
High slope area	Dataset with the constraint for high slope areas. Gridded data at 0.25°x0.25° spatial resolution over the globe domain, with a binary format, where 1 represents a restricted grid cell with a high slope coverage equal to or higher than 60%.	ETOPO1 Global Relief Model from National Oceanic and Atmospheric Administration	halo_a
High elevation areas	Dataset with the constraint for high elevation areas. Gridded data at 0.25°x0.25° grid resolution over the globe domain, with a binary format, where 1 represents a restricted pixel under this specific criterion.	ETOPO1 Global Relief Model from National Oceanic and Atmospheric Administration	hele_a
Distance to shore areas	Dataset with the constraint for the distance to shore for offshore areas. Gridded data at 0.25°x0.25° spatial resolution over the globe domain, with a binary format, where 1 represents a restricted pixel under this specific criterion.	ERA5 land-sea mask from ECMWF	dist_s

Energy Conversion models

In this section, we describe the three physical models and the statistical model used to determine the energy variables required for the new ENTSO-E PECDv4.1 database. We also outline the sources of input data for these models, along with the methodologies employed for their calibration and validation, referencing the historical data flow. Each of the four energy conversion models—wind power, photovoltaic solar power, concentrated solar power, and hydropower—is discussed in a dedicated paragraph.

Wind Power conversion model

For the wind power conversion model, all simulations start at the wind power plant (WPP) level, with results aggregated to the required regional level. The conversion to power generation is handled slightly differently for existing and future installations. Existing installations are modelled based on location, capacity, and technology data from WindPowerNet¹² (filtered to represent 2020 online installations). Several wind technologies are simulated as potential future installations, to model a range of specific power and hub height options.

Handling of the climate data

The climate data (grid 0.25° x 0.25°, 10m, and 100m height) are interpolated as described in Murcia et al. (2022). Specifically, a cubic-spline interpolation is used for horizontal interpolation of wind speeds. For vertical wind speed interpolation, power law interpolation is used between the two closest heights available, and similarly for extrapolation when needed. This method is equivalent to piece-wise linear interpolation in a log-log scale. Although surface roughness is not explicitly considered in the interpolation/extrapolation process, it is implicitly accounted for as the underlying weather models incorporate surface roughness to derive wind speeds at different heights. Wind speeds are interpolated at every time step for each WPP. The input wind speed at 10m is bias-adjusted using COSMO data as the reference dataset (see Section 2.3), while the input wind speed at 100m is derived from the bias-adjusted wind speed at 10m using a power law (see Section 2.2).

Conversion to wind power generation

Turbine-level power curves are based on the generic power curve model in PyWake¹³ , which estimates a power curve based on turbine diameter and rated power. It is assumed that these two parameters are available for all WPPs. For now, default values are used for the other parameters needed for estimating the power curve (this can be changed later in the service), as reported in Table 2.4. Example comparisons to power curves from the WindPowerNet are shown in Figure 2.15. The generic model shows a good fit for the power curves.

Figure 2.15: Comparison of the generic turbine-level power curves (surrogate) to two example power curves from the WindPowerNet (https://www.thewindpower.net) power curve database (actual). Note that the generic power curve model does not consider the storm shutdown part (ws > 20), as it is considered a separate part of the model.

To get to a plant-level power curve (as information about WPPs is given at the plant level), an updated version of the generic wake model from Murcia et al. (2022) is used. When combined with the generic turbine-level power curve, a final plant-level power curve is specified by the parameters given in Table 2.4 (generic values are assumed for other input parameters); wind speed (WS) is not an input parameter for a WPP, but it shows the range of wind speed on which the plant-level power curve is estimated. A look-up table with linear interpolation is implemented so that a power curve can be estimated for any combination of inputs (within the parameter ranges given in Table 2.4; if an input below/above the range is given, the model reverts to the min/max of the range, respectively).

For now, the generic power curve method assumes always pitch-controlled turbines¹⁴ . This will be reconsidered later in the service, as stall-controlled turbines¹⁵ are a small part of the existing fleet in Europe.

Table 2.4: The parameters defining the generic plant-level power curve, with the range of supported values for the parameters which can be varied.

Default parameters (fixed)
air_density	1.225 kg/m³
max_cp	0.49
constant_ct	0.8
gear_loss_const	0.01
gear_loss_var	0.014
generator_loss	0.03
converter_loss	0.03
turbulence_intensity	0.1

Varied parameters, with a supported range
Rotor diameter	10-250 m
Plant installation density	4-10 MW/km²
Specific power	100-650 W/m²
Number of turbines	1-1024

For future onshore wind installations, turbines with specific powers ranging from 198 to 335 W/m², as shown in Swisher et al. (2022), are used. For future offshore wind installations, turbines with specific powers of 316 and 370 W/m² are used. The selected specific powers are the same as those used in the PECD 2021 update. An overview of the simulated future wind technologies is given in Table 2.5 and Table 2.6, which also lists the corresponding options found in the widget "Technological specification" in the download form. Each wind technology option is labeled with a number representing a specific combination of hub height (HH) and specific power (SP). For example, "21 (SP316 HH155)" refers to offshore wind power with a specific power of 316 W/m² and a hub height of 155 m. These labels allow users to easily select the desired wind turbine specification from the dataset.

Wakes are considered for all future technologies (Swisher et al., 2022). The specific power and hub height are the main drivers for variation in the resulting generation time series; the rotor diameter and rated power have limited impact (as a result are given in standardized generation, i.e., in values between 0 and 1). Compared to the previous version of the work, the onshore wind turbine-rated power is increased to 5 MW (from 3.6 MW), based on feedback from ENTSO-E.

The more generic model (to run any combination of specific power and hub height), as presented in the previous section is also made available. This enables plant-level power curves to be estimated for any combination of specific power, hub height, and plant size (within the supported range shown in Table 2.4).

Table 2.5: Future technology onshore wind turbines.

Specific Power [W/m²]	Rotor Diameter [m]	Hub Height [m]	Rated Power [MW]	Correspondent codes in the download form on CDS
199	152	100, 150, 200	5	31 (SP199 HH100) 32 (SP199 HH150) 33 (SP199 HH200)
277	129	100, 150, 200	5	34 (SP277 HH100) 35 (SP277 HH150) 36 (SP277 HH200)
335	117	100, 150, 200	5	37 (SP335 HH100) 38 (SP335 HH150) 39 (SP335 HH200)

Table 2.6: Future technology offshore wind turbines.

Specific Power [W/m²]	Rotor Diameter [m]	Hub Height [m]	Rated Power [MW]	Correspondent codes in the download form on CDS
316	269	155	18	21 (SP316 HH155)
370	249	155	18	22 (SP370 HH155)

The storm shutdown behavior is modeled as described in Murcia et al. (2021), assuming a direct (non-controlled) shutdown for all existing wind power plants (WPPs), using data from the WindPowerNet WPP installation database for the shutdown wind speeds. For future wind technologies, a 25 m/s cut-off is assumed for onshore wind installations, and the HWS (High Wind Speed) Deep type from Murcia et al. (2021) is used for future offshore wind installations (as in the PECD 2021 update). The shutdown procedure is modeled as a 'hysteresis,' where a restart occurs only after the wind speed has dropped to a sufficiently low value for a restart to take place (see Figure 2.16). The storm shutdown is a dynamic model that captures three aspects:

Individual wind turbine shutdown and restart as each turbine experiences wind speed fluctuations that can exceed 25 m/s (10-minute mean cut-off wind speed) depending on the duration of exceeding the limits, as illustrated in Figure 2.16.
Plant shutdown does not occur in the same manner as individual turbines; not all turbines in a plant shut down simultaneously as each turbine experiences slightly different wind speeds at a given time.
The restart operation happens only at a somewhat lower wind speed than shutdown to prevent cycling between shutdown and restart when the wind speed hovers around the shutdown wind speed (e.g., 25 m/s). More details are provided in Murcia et al. (2021).

Figure 2.16: Single-turbine storm shutdown for two storm shutdown technologies. The different shutdown limits (up to 1 s) have been considered in detailed simulations, but a simplified plant-level behavior (Murcia et al., 2021) is used for the simulations in this service. Figure taken from (Murcia et al., 2021).

https://www.thewindpower.net
https://topfarm.pages.windenergy.dtu.dk/PyWake/notebooks/WindTurbines.html
Pitch-controlled wind turbines adjust the angle (pitch) of the blades to control the rotor speed and optimize the power output. When wind speeds are low to moderate, the blades are pitched to capture the maximum amount of wind energy. As wind speeds increase, the blades are gradually pitched out of the wind to maintain a safe rotor speed and prevent damage. This dynamic adjustment allows pitch-controlled turbines to operate efficiently over a wide range of wind conditions and maximize energy capture while minimizing mechanical stress.
Stall-controlled wind turbines do not adjust the blade angle dynamically. Instead, the blades are fixed at a specific angle. As wind speeds increase, the airflow over the blades eventually becomes turbulent or "stalls," which naturally limits the rotational speed and prevents overloading the generator. While simpler and less expensive than pitch-controlled systems, stall-controlled turbines are generally less efficient because they cannot adjust to varying wind conditions as effectively. They are also more prone to mechanical stress and wear due to the lack of active control over the rotor speed.

Simulated locations and wind technologies

The simulated locations and wind technologies depend on the type of run. An overview of the runs is given in Table 2.7.

Table 2.7: Wind run types.

Run type	ERA5 simulated years	WPP locations	WPP technology	Losses
Validation (for validation only, not delivered)	2015-2021	Changed every year to match changing WPP installations (based on WindPowerNet data)	Existing WPP parameters based on WindPowerNet data (changed every year), applied in the generic power curve model	Wakes as part of the generic power curve. And 10 % for other losses (incl. unavailability), applied as a simple multiplication by 0.9
Existing	1980-2021	All years with 2020 WPP locations (based on WindPowerNet data)	Existing WPP parameters based on WindPowerNet data (always 2020 fleet), applied in the generic power curve model	Wakes as part of the generic power curve. And 10 % for other losses (incl. unavailability), applied as a simple multiplication by 0.9
Future wind technologies	1980-2021	The best 10-50 % locations of the unmasked points within each PECD region (in terms of mean wind speed in the bias-adjusted ERA5 data, based on ERA5 grid).	Onshore wind: 3 hub heights and 3 turbine types, so in total 9 wind technologies. A plant of 50 MW with ten 5 MW turbines modelled for each technology. Offshore wind: 1 hub height and 2 turbine types, so in total 2 wind technologies. A plant of 500 MW with 28 18 MW turbines modelled for each technology.	Wakes as part of power curves. And 5 % for other losses (incl. unavailability), applied as a simple multiplication by 0.95

Some notes on Table 2.7:

all wake modelling considers only intra-farm wakes (wakes between plants are not considered).
Literature suggests a range of 5 % to 10 % for the other losses (Mortensen, 2018). The existing installations cover historical installations over tens of years with older technology, whereas the future installations are new installations (no wear-and-tear considered) with modern technology: it was thus considered fair to place them at the opposite sides of the loss range.
A suitable mask is used to find the potential points for the Future wind technologies runs.
Locations of existing wind power plants are not considered in the assessment of the 10-50 % best locations for each region. This is done because the decommissioning of old turbines is expected to free up more space for new installations in the future.
The assumed locations of wind power plant installations within a region significantly impact the expected capacity factor on the aggregate level (Swisher et al., 2022). At this point, only one ‘resource grade’ (i.e., the 10-50 % best locations) is simulated; however, simulations covering also the 10 % best locations and the 50 % worst locations (or in principle any other distribution split between 0 and 100 %) could be provided in a later version of the PECD in consultation with ENTSO-E. However, this would multiply the amount of Future wind technology time series.

In addition to the plant-level power curves, information on the existing wind power installations is required to simulate generation from the existing fleet. Data from WindPowerNet are used, with the missing technical parameters (turbine type and hub height) estimated based on the machine learning approach from Koivisto et al. (2021). Wind power plants without location or installed capacity information are removed (this will be reviewed in the next phase of the service). An overview of the installed capacities (2020 fleet) and key WPP technical parameters for onshore and offshore installations are shown in Figures 2.17 to 2.20.

Figure 2.17: Onshore wind installation capacities in the different PECD regions in the Existing run (i.e., 2020 installations).

Figure 2.18: Onshore regional weighted mean hub heights (left) and specific powers (right) in the different PECD regions in the Existing run (i.e., 2020 installations).

Figure 2.19: Offshore wind installation capacities in the different PECD regions in the Existing run (i.e., 2020 installations).

Figure 2.20: Offshore regional weighted mean hub heights (left) and specific powers (right) in the different PECD regions in the Existing run (i.e., 2020 installations).

For future wind installations, the starting point is the ERA5 grid points. Masking, based on the Exclusion layers presented in Section 2.9, is then applied to these points to select potential future WPP locations. The potential points are shown in Figure 2.21. After selecting the 10-50% best points (based on 100 m mean wind speeds), the resulting final future installation simulation points can be seen for onshore and offshore wind in Figure 2.22. The selection of 10-50% best points is the average ‘resource grade’ selection following from the work done by Swisher et al. (2022), where also the best 10 % and worst 50 % selection of points were simulated for each region. Similar additional runs can be performed at a later stage in the project in agreement with ENTSO-E, to model the decrease in capacity factor as more and more of the best wind resource locations are used. However, this would multiply the number of wind time series related to future installations.

Figure 2.21: All potential future installation locations (light blue dots) are shown for onshore (top) and offshore (bottom). Some points were removed due to the applied mask with exclusion layers (the mask is binary: a point either is considered or not). If a small region has no points (e.g., Malta), 1 simulation point is added manually.

Figure 2.22: Onshore (top) and offshore wind (bottom) locations for the future technology runs (light blue dots), following the selection based on the 10-50% best points of 100m mean wind speed values. The locations of existing wind power plants and the repowering of existing plants are not considered.

Aggregation to the regional level

After simulating each WPP, the results are aggregated for each region. For existing locations, weighting based on installed capacity is used. For future technologies, the same weight is used for each location. From a processing point of view, temporary NetCDF files are used, but the final regional results are saved as CSV files. In addition to power generation, similar weighted regional wind speed averages are saved.

Post-hoc corrections following TSOs’ feedback

After the simulations were completed, the German TSOs (Transmission System Operators) provided feedback on the wind generation output (in Spring 2024). In particular, it was noted that the simulated German wind generation, both onshore and offshore, exceeded the measured data from the TSOs. To lower the wind generation capacity factors, the following adjustments were made to the Existing wind generation runs for Germany:

German onshore wind Existing runs:
- The losses (other than wake losses) were increased to 13.6 % (compared to the 10 % generic assumption)
German offshore wind Existing runs:
- The installation density was increased to 10 MW/km² (compared to the 7 MW/km² generic assumption)
- The specific power of the turbines was decreased by 20 % (compared to the information about the existing offshore fleet in Germany)
- The losses (other than wake losses) were increased to 14.5 % (compared to the 10 % generic assumption)

More TSOs feedback will be taken into account for future versions of the PECD following the ENTSO-E internal schedule.

Photovoltaic Solar Power conversion model

To estimate the PV capacity factor at the regional scale, a flexible yet comprehensive modelling workflow has been implemented (Saint-Drenan et al., 2018). It relies on first modelling PV at an individual system level, considering the location and module plane-of-array (POA, i.e., orientation and tilt); this is then upscaled to aggregated PV by considering a range of possible POAs which are then averaged considering some weighting which is driven by the metadata of actual PV plants.

Temporal downscaling

In the implemented PV modelling workflow a downscaling is considered: GHI and air temperature TA are downscaled to 15 minutes and then converted to PV capacity factor. The interpolation of TA is linear instead the interpolation of GHI is required before conversion to clearness index (kt), to remove its dependence on variations on the sun position, this is achieved by dividing GHI by its equivalent irradiation at the top of the atmosphere (TOA). The detrended Kt time series is then downscaled to a 15-minute resolution using a linear interpolation and reconverted to GHI multiplying by the TOA radiation. By averaging these results back to 1 hour instead of directly deriving hourly values, the time dependence of the PV power production can be better captured on the sun position and avoid any artifact at an hourly scale.

Inferring plane-of-array irradiance: decomposition and transposition

The transposition of GHI, which corresponds to the horizontal plane, to a given plane-of-array requires knowing the direct and diffuse components of the solar radiation. Since GHI only considers the global radiation (i.e., the sum of both components), using a so-called decomposition model to estimate these two components is required.

Diffuse horizontal irradiance (DHI) is first estimated using the Skartveit-Olseth model (Skartveit et al., 1987), which estimates the diffuse fraction (i.e., the ratio between the diffuse and global radiation). It assumes three regimes related to different overall weather conditions (overcast, partially cloudy, and clear sky) and considers the clearness index Kt and sun elevation angle, since the first describes the atmosphere transmissivity and the second is related to the air mass.

Then, the transposition of each component is treated separately, including the modelling of ground-reflected irradiance:

\[ GTI = R_b \times BHI + R_d \times DHI + R_r \times GHI \]

where GTI, BHI, and DHI correspond to global tilted, beam horizontal, and diffuse horizontal irradiances. R_b, R_d, and R_r correspond to the transposition functions for the beam, diffuse and reflected irradiance, respectively.

The isotropic direct irradiance is transposed using a simple trigonometric:

\[ R_b = \frac{\cos(AOI)}{\cos(SZA)} \]

with AOI corresponding to the angle of incidence and SZA to the solar zenith angle. Both were calculated using the SG2 algorithm (Blanc et al., 2012).

Diffuse irradiance is transposed using the Klucher model (Klucher, 1979), specifically its implementation in the pvlib Python package. This anisotropic approach considers both the sky brightening near the sun (i.e., circumsolar irradiance) as well as the increase in irradiance near the horizon by hybridizing the Liu-Jordan (Liu & Jordan, 1961) and Temps-Coulson (Temps & Coulson, 1977) models, since the first is more suitable for overcast conditions and the second for partially cloudy and clear-sky:

\[ R_d = \Bigl(\frac{1+\cos\beta}{2}\Bigr) \times (1+f_k \times \cos^2(AOI) \times \cos^3(SZA)) \times \Bigl(1+f_k \times \sin^3\frac{\beta}{2}\Bigr) \]

where β is the surface tilt angle and f_k corresponds to:

\[ f_k = 1- \frac{DHI}{GHI} \]

Lastly, reflected irradiance (R_r) is modelled as being isotropic and with a constant ground albedo. This was done assuming a 0.2 albedo (ρ) which is most used in the literature (Gueymard et al., 2019):

\[ R_r = \rho \times \Bigl(1-\cos\frac{\beta}{2}\Bigr) \]

PV modelling: optical losses, conversion efficiency, temperature and inverter losses

Before addressing the PV conversion process, it is important to take into account the reflection losses resulting from the optical properties of the glazing of PV modules. This was modelled using the Martin-Ruiz model (Martin & Ruiz, 2001, 2013), where surface reflectance depends on the type of surface and the angle of incidence of solar irradiance:

\[ \bar R(AOI) = \bar R(0) + \bigl(1-\bar R(0)\bigr) \times \frac{\Bigl(exp\Bigl(-\cos \frac{AOI}{a_r}\Bigr) -\exp \Bigl(-\frac{1}{a_r} \Bigr) \Bigr)}{1-\exp \Bigl(-\frac{1}{a_r} \Bigr)} \]

Where \( \bar R(0) \) is the reflectance for normal incidence and \( a_r \) an angular loss coefficient. Both values are surface type-dependent, having a table of values listed in the original work. The reference values for a standard polycrystalline module have been used.

Knowing the effective incoming irradiance, PV module efficiency is calculated as proposed in (Beyer et al., 2004). Efficiency is first inferred for a 25°C module temperature, using an expanded version of the model proposed by (Williams et al., 2003), fitting the following parametric equation against actual PV data:

\[ \eta_{25^\circ C} = a_1 \times GTI + a_2 \times GTI^2 + a_3 \times GTI^3 + a_4 \times GTI^4 + a_5 \times GTI \times \log(GTI) \]

Here, the a_i parameters were estimated using PV generation data from hundreds of PV installations in Germany and France.

Then, module temperature (TPV) is estimated as proposed in (Ross, 1976) and the effective efficiency, accounting for thermal losses, considers a 0.45%/°C temperature coefficient:

\[ \eta_{eff} = \eta_{25^\circ C} \times (1-0.0045 \times(T_{PV}-25)) \]

Lastly, AC generation is calculated considering inverter losses according to the model in (Schmidt et al., 1994), which takes into account the inverter self-consumption as well as the voltage- and ohmic-related losses:

\[ P_{PV,AC} = GTI \times \eta_{eff} \times (b_1 + b_2 \times \eta_{eff} + b_3 \times \eta^2_{eff}) \]

This approach neglects the effect of module degradation, inverter clipping and the variability of the module parameters (efficiency, temperature coefficient, among others). These assumptions are judged acceptable considering the lack of information on the installed fleet of PV systems in the PECD region; nonetheless, the uncertainty associated with this should be considerably mitigated by the smoothing effect resulting from spatially aggregating at a regional scale.

Integrating a distribution of plane-of-arrays

To upscale to regional PV generation, the distribution of the module tilt and orientation of the fleet of PV systems was inferred from the metadata of hundreds of utility-scale PV systems located in France and Germany, for which we can access quality data. First, a focus was placed on the module tilt since this is expected to vary considerably with the latitude (Figure 2.23).

Figure 2.23: Spatial distribution of the tilt angle of hundreds of PV plants deployed in France and Germany, each with over 1 MWp of installed capacity.

To circumvent the already mentioned limited geographical coverage of these data, the tilt was normalized by its theoretical optimum value (which maximizes the incident irradiation), so that the resulting distribution is more generalizable in space. To calculate the optimum tilt for the PECD region, the PV generation is estimated for different tilts, always south-oriented, using ERA5 data between 2015 and 2020; the tilt resulting in the highest overall irradiation is selected (Figure 2.24).

Figure 2.24: Optimal tilt angle, which maximizes annual yield, over the PECD domain calculated considering a 5-year period of ERA5 data.

Figure 2.25 compares the optimal tilt estimated from the ERA5 with the tilt from actual installations, showing that most installations are deployed with a tilt lower than the one that maximizes the incident irradiation (with a high density around 75% of that value). A plausible explanation for this is that a lower tilt reduces the resulting inter-row shadows, which in turn increases, up to a certain point, the kWh generated per unit of area. The tilt ratio and orientation are visualized as a 2D histogram which shows that both parameters can be well described by two normal distributions (Figure 2.26).

Finally, the fleet PV capacity factor is calculated for a given time t and location x by summing the product of each PV configuration and its respective weight w:

\[ P_{PV,fleet} = \sum_{i}^{}w_{i} \times P_{PV,AC}(x,t,POA_{i}) \]

Figure 2.25: Relationship between module effective and optimal tilt angles. It is possible to see that, for the considered sample of hundreds of installations in Germany and France, a considerable fraction of cases shows an effective tilt lower than what would be deemed optimal from an annual yield perspective.

Figure 2.26: Empirical distribution of the PV installations' tilt and azimuth angles, with the first being relative to the local optimal tilt angle, as well as inferred normal distributions.

Application of exclusion areas and spatial aggregation

Once the PV capacity factor product is generated for the PECD-constrained ERA5 grid, regional estimates for bidding and PECD zones are calculated through a spatial average. However, it is important to note that particular (restricted) areas were masked in both the grid-like and regional-based products to produce more accurate results. Specifically, sea and ocean areas (thus, off-shore PV), polar and protected areas, as well as locations with high elevation (above 2000 m a.s.l.) or slope (higher than 10%) were excluded from the computation. While high elevation may be unsuitable as an exclusion criterion at a global scale (notably for Chile), we found that for the PECD area this does not pose issues in terms of final PV estimates. The information to identify such regions was obtained from a range of sources (see Section 2.9).

Improvements over the previous methodology

At this stage of the service, the main improvement in the computation of PV estimates involves adapting the regional PV model to the specifications of the PECD region. In particular, the dataset of optimal tilt angles needed recalculation, as the previously used dataset (PVGIS) did not have the required spatial coverage. This calculation was performed using 5 years of ERA5 data. The model parameters were also compared to metadata available for hundreds of large PV systems in France and Germany, resulting in a minor adaptation based on this comparison. Finally, urban areas, which were previously part of the exclusion areas, are now considered.

Assessing model performance

As a first evaluation step, a visual assessment was conducted using scatter plots where model outputs were compared with the solar capacity factor derived from ENTSO-E data (Figure 2.27). Four countries were selected based on the level of PV penetration and the quality of ENTSO-E installed capacity and generation data: Belgium (BE), Germany (DE), Spain (ES), and France (FR).

Overall, there appears to be good agreement between the two datasets, with an acceptable degree of dispersion. The higher dispersion observed for Spain is due to its unique context, as the ENTSO-E timeseries for solar also include Concentrated Solar Power (CSP), which was not accounted for in the conversion from generation to capacity factor. Additionally, since this technology is often coupled with storage and is therefore dispatchable, the resulting deviations are not necessarily systematic over time. The evaluation results for Spain should thus be interpreted with caution. It is also important to note that these results align with the previous Service, demonstrating interesting accuracy levels despite limited data availability and the need for numerous assumptions.

However, when these same plots are analyzed on a yearly basis, the slope of the trend regression can vary from year to year. Figure 2.28 illustrates this effect for France, and similar patterns are observed for Belgium and Spain (Germany is an exception, showing a stable slope close to 1). The consistent PV conversion methodology used throughout the period, along with Germany's stable slopes, suggests that this behavior is more related to the ENTSO-E data (e.g., inconsistencies between reported installed capacity and corresponding generation).

Figure 2.27: Scatter plots comparing the solar capacity factor derived from ENTSO-E data and the data generated for Belgium (BE), Germany (DE), Spain (ES), and France (FR).

Figure 2.28: Scatter plots comparing the solar capacity factor derived from ENTSO-E data and the data generated for individual years (2017-2020) for France.

If this slope-related issue is not addressed, evaluating model performance by directly calculating conventional statistical metrics (mean absolute deviation and mean bias deviation – MAD, MBD, both normalized by the long-term PV capacity factor average) can artificially penalize model accuracy, as shown in the upper part of Figure 2.29. Therefore, these metrics were recalculated after adjusting the model residuals of each year to the corresponding annual regression slope, resulting in more stable outcomes, as shown in the lower part of Figure 2.29. The relatively large nMAD (normalized mean absolute deviation) values after adjusting the residuals are expected and can be attributed to modelling limitations, such as not accounting for the spatial distribution and the presence of tracking systems in the aggregated PV fleet, as well as inter-row shadowing, outages, and curtailment.

Figure 2.29: Relationship between normalized mean absolute deviation and mean bias deviation (nMAD and nMBD, above) and adjusted mean absolute deviation and mean bias deviation (nMAD adjusted and nMBD adjusted, below) vs regression slope resulting from the scatter plots as shown in the previous figure. While colour indicates the corresponding country, each scatter point corresponds to one year of data.

Testing a new typology-based segmented modelling approach

Photovoltaic technology, known for its implementation flexibility, can be installed in diverse contexts, such as rooftops, open fields, and water bodies. These contexts significantly influence the engineering design of PV installations and the characteristics chosen by project designers. For example, the tilt and azimuth of PV modules affect the daily and seasonal generation profile and optical losses due to reflection. Ventilation conditions, which influence module temperature and thermal losses, are another important factor. This variability prompted an investigation into whether expanding the current modelling approach, implemented for this Service as a single PV workflow for all PV installations, to account for these various contexts separately would yield better results.

A pilot activity was initiated and developed in collaboration with Terna, the Italian TSO, using Italy as a case study. This initiative spurred valuable discussions on data and model implementation and led to the development of modelling workflows for four PV typologies: residential rooftop, industrial rooftop, ground-based fixed utility-scale, and ground-based tracking installations. This new methodology, which can now employ more specific parameterizations rather than a single generic one, will be detailed along with the results, insights, and future plans at a later date. Discussions are already underway to progressively expand this initial version of the new methodology.

Post-hoc corrections following TSO's feedback

Upon the inquiry from the Spanish TSO, Red Eléctrica, a positive bias was identified in the produced SPV data after comparing model outputs with reference data provided by the TSO. This led to the application of a multiplicative correction factor of 1.1 to all Spanish timeseries.

While this is likely due to more than a simple bias in the model, discussions are now taking place to have a better understanding of the issue. This will likely prompt further testing of the methodology upgrade discussed in Section 3.5 within the context of Spain.

Concentrated Solar Power conversion model

As indicated in the workplan, the concentrated solar power (CSP) model developed by DTU and used in the previous PECD version is used also in this service. A brief description of this model follows. The CSP model consists of a solar field, a power block and a thermal energy storage. The main parameters are solar multiple (SM), which is the ratio between the solar field capacity and the turbine capacity, installed capacity, the different thermal block efficiencies and the thermal storage size. The storage size is given in hours of rated capacity operation. The heat transfer fluid, modelled as a first-order dynamic system characterized by a time constant, drives the delay in the response between a change in DNI (direct normal irradiance) and the power produced in a CSP plant (even if there is no storage).

The modelled solar production operation strategy consists of two assumptions: if the solar field produces more power than required to produce at rated power, the excess energy will be stored. If the solar field produces less than the required to generate at rated power, the storage will discharge the energy required to bring it to rated power (see Figure 2.30). Such an operation strategy does not need knowledge of market prices. The link between solar multiple and thermal energy storage is the same as used in the previous version of PECD (see Table 2.8). The CSP model is recalibrated using the newest climate data.

The best 50% of locations (in terms of mean DNI, for each PECD region separately) are selected as the simulated CSP installation locations. Two runs are performed in the CSP analysis:

CSP plants are simulated without energy storage.
CSP plants with 7h of thermal energy storage.

In the widget "Technological specification" in the download form, each CSP option is represented by a number corresponding to whether the plant includes energy storage and whether the energy is considered before or after dispatch. The available options are the following:

40 (Pre-dispatch, no storage): indicates potential energy generation before storage, with no storage capacity;
41 (Dispatched, no storage): energy actually dispatched from a plant with no storage capacity;
42 (Pre-dispatch, 7-hours of storage): potential energy generation from a plant with 7 hours of storage, before dispatch;
43 (Dispatched, 7-hours of storage): energy actually dispatched from a plant with 7 hours of storage.

Figure 2.30: Overview of CSP behavior when thermal storage is available. The power from the solar field (dashed green line) is higher than the installed capacity (1.0) and is thus stored. The orange line shows dispatch from the CSP plant.

Table 2.8: Solar multiple (SM) as a function of thermal energy storage (TES).

TES (hours)	SM
0	1.5
3	1.75
6	2.0
9	2.5
12	2.9
18	3.0

Hydro Power conversion model

The objective for the Hydropower (HP) model for the historical stream is to model the hydropower energy indicators starting from climate data, reconstructing their time series for the historical period (1979-2022).

The target spatial resolution, originally set at the country level, has been refined to the Bidding Zone level (SZON), providing more detailed data for Italy, Norway, and Sweden. Specifically, the southern Norwegian region (NOS0) has been divided into three separate PECD regions (NOS1, NOS2, NOS3) as requested by ENTSO-E. The target temporal resolution is weekly.

The starting point of the work is the publicly available generation data (in MW) that can be accessed through the ENTSO-E Transparency Platform (TP) with which the model has been trained and validated to produce the results up to December 2021. The data include hydropower generation timeseries (at a resolution of 15 min, 30 min, or 1 hour depending on the country), Installed Capacity time series (annual), and Stored Energy (SE) time series to reservoirs (also referred to as ‘Filling Rates’) and pumped storage (at weekly resolution). Since these data are not sufficient to yield a complete dataset for simulations, two additional sources have been employed: data provided directly by TSOs and inflow data from the previous PECDv3.1 (see Table 2.12 for more details). The three sources were ranked following data reliability in accordance with ENTSO-E: in particular, TSOs' data are accounted for as the most reliable and are ranked with the highest priority. This data includes generation and pumping timeseries at hourly resolution and NUT0 or PECD granularity. Some TSOs provided timeseries of stored energy for their countries at weekly resolution for reservoir and open-loop pumped storage technologies, which were used to estimate inflows for such technologies (see countries citing ‘TSO’ as a source under inflow columns HRI and HOL, Table 2.12). Additionally, some countries provided monthly timeseries of Installed Capacity (IC), which were useful to account for significant changes in generation due to new installations throughout the historical time series (this information was used for countries citing ‘rescaled using monthly IC’, Table 2.12).

Where TSO timeseries are not complete, TP data are used, with some exceptions (see section Estimating Inflows). Finally, PECDv3.1 data have been employed where TSO and TP data are not sufficient. Especially, they help in the completion of the open-loop pumped storage inflow data, since only a few TSOs are able to share stored energy timeseries for this technology.

The climate historical data are taken from the ERA5 Reanalysis model for the validation runs and the reconstruction of historical time series. The data is hence aggregated at NUT0 level and at PEON level, to address the target granularity.

Given the size of the domain and the absence of publicly available data for the individual HP plants (such as plant heads, installed technology and artificial regulation), the idea of employing a hydrological model was discarded. A statistical approach was instead adopted, for its adaptability to the available inputs and the reduced computational costs, accounting for the climate impact by means of temperature and precipitation input. The application of a hydrological model, without accounting for artificial regulation, would imply the misrepresentation of climatic forcing on HP generation correlated with energy demand.

The following chapters describe the statistical model, the pre-processing of input data, the validation procedure, and the use of the model for the reconstruction of historical data and the estimate of future projections. Finally, the last chapter describes the adopted methodology to estimate the inflows starting from the available data.

The Statistical Model

The statistical model here adopted is the Random Forest Regression model (Pedregosa et al., 2011; RF model), a machine learning model based on ensemble learning, which already proved to work well at such a resolution and broad domain in a previous study by Ho et al. (2020). In a preliminary comparison, at the first stages of the project, the model also proved a comparable performance over France for both HRE (Reservoirs) and HRO (Run-of-river) technologies with respect to a Neural Network fed by discharge data (a model employed in the current PECD).

The Random Forest takes as input the generation (or inflow) data, namely the target variable, and some climate datasets covering the same time period, the predictors, and trains a large number of decision trees to predict the target variable starting from the predictors. In the end, it averages the answers from all the trees to obtain the model prediction. The number of trees in the ‘forest’, and their characteristics can be adjusted by tuning several parameters.

Energy data pre-processing

In the case of the TP, the hydropower generation, Installed Capacity and Stored Energy time series are extracted from a larger database for each PECD country and re-organized in multiple CSV files. Similarly, also TSO and PECDv3.1 data are organized into analogous CSV files. Where needed, the generation data is resampled to 1h. A weekly aggregation follows and consists of a sum of the hourly values for those weeks where at least 80% of data are available. If this holds true, the gaps in hourly values are filled by a simple interpolation. If the week presents >20% of missing values, the whole week is set to NaN. Specific checks are also made for the first values of the timeseries, as they are often unphysical, in which case they are adjusted based on adjacent values or set to NaN.

Stored Energy data are available at weekly resolution. Here too, the first values of the time series are sometimes unphysical and are therefore manually corrected. The presence of other unphysical jumps in the signal is also checked: in this case, the values are first set to NaN and then interpolated to fill the gaps if these are small.

Finally, while the generation can be directly employed as a predictor of the RF model, the inflows must first be estimated starting from the available data (see section Estimating Inflows) and then modelled.

Climate data preprocessing

For the purposes of this study, the most informative variables that can be found in all climate datasets are 2-m temperature (TA [K]) and total precipitation (TP [m])¹⁶ , which are commonly fed to hydrological models to compute river discharge. In particular, the two variables are useful if averaged (for TA) and cumulated (for TP) over multiple weeks preceding the time of the estimation of the generation or inflow. It is important, for instance, to consider the time lag between a precipitation event over a given area, and the correspondent discharge water reaching the hydropower plants downstream. Therefore, precipitation is cumulated over up to 30 weeks, while temperature is averaged over up to 15 weeks. According to the example of Table 2.9, if the model is used to estimate the HP generation produced for the week of 2015-01-05, it will take as predictors the TA and TP for that same week, as well as the average TA of the previous 2, 3, 4, …, 15 weeks, and the cumulated over the previous 2, 3, 4, …, 30 weeks.

Table 2.9: Schematic of the predictors (columns) used as input by the Random Forest model for the simulation of hydropower generation or inflow for the first weeks of January 2015 (generic dates). TA and TP stand respectively for 2-m temperature and total precipitation, while W followed by a number indicates the number of past weeks over which the variable has been averaged (for TA[K]) or aggregated (for TP[m]).

Date	TA_W1	TP_W1	TA_W2	TP_W2	TA_W3	TP_W3	…
2015-01-05	276	0.007	276	0.025	276	0.027	…
2015-01-12	278	0.009	277	0.016	276.7	0.034	…
…	…	…	…	…	…	…

The climate data used for both the training/validation, over the period when observations are available, and the subsequent reconstruction of the historical time series, extending it to 1979, comes from the ERA5 Reanalysis model.

The datasets are aggregated at weekly resolution (summing precipitation and averaging temperature) and then the lags up to 30 weeks are calculated, meaning that values are cumulated (summed/averaged) over multiple weeks to yield several more datasets, which will be used as predictors for the RF model. At the end of this pre-processing step, one CSV file per country and climate dataset is produced.

Also snow depth can be informative for mountainous regions but was demonstrated to add little to no value (see Lin Ho et all.) to the main predictors: temperature and precipitation; furthermore, not all climate projection datasets include snow depth among the modelled variables. The latter is true also for river discharge, which otherwise could also have been employed as a predictor.

Model validation: Leave-One-Year-Out Validation

The model is validated separately for each SZON region and indicator, over the period of energy data availability (within 2015-2022 in case of TP data, 2010-2022 in case of TSO data, 2010-2017 in case of PECDv3.1 data). The validation procedure followed is the Leave-One-Year-Out (LOYO), which trains the model over all N available years except one (test year) and the evaluation of the model performance over this test year. This is repeated N times, keeping one year as the test year, until the complete estimated time series can be assembled (see Figure 2.31).

Figure 2.31: Example of inflow to reservoirs time series estimated (or predicted) through LOYO procedure with a random forest regression model (red), against observation (grey).

Once the estimated time series is available, several metrics are calculated to quantify the goodness of the fit to observations. Among these, the Nash-Sutcliffe Efficiency (NSE) metric, widely used in hydrology, is adopted as one of the main reference metrics:

\[ NSE = 1 - \dfrac{\sum_{i=0}^{n}(x^{i}_{m} - x^{i}_{o})^{2}}{\sum_{i=0}^{n}(x^{i}_{o} - \overline{x}_{o})^{2}} \]

where \( x^{i}_{m} \) is the modelled value at timestep i, \( x^{i}_{o} \) the observed value at timestep, \( \overline{x}_{o} \) the mean of the observed values, and n the total number of timesteps.

For instance, in the case of the modelled time series in Figure 2.31, the NSE value is 0.59 (as also reported in the upper left corner of the figure). The metric is calculated as one minus the ratio between the errun-of-river variance of the modelled timeseries and the variance of the observed timeseries. If there is no difference between the modelled (m) values and the observed (o) ones at each timestep (i), then the NSE will be 1 (perfect fit), which is the maximum value that can be reached. On the other hand, if there are significant differences between the two timeseries, the NSE can reach negative values (up to -Inf). An NSE = 0 would indicate that the model has the same predictive skill as the mean of the timeseries in terms of the sum of the squared error.

RF Model Parameters

As mentioned, the Random Forest can be built by specifying several parameters. The main parameters indicated in Table 2.10 have been tuned country by country and indicator by indicator. This has been done by sampling a hyperparameter space with the Latin Hypercube Sampling algorithm to find the set able to optimize a selected metric. The hyperparameter space has been defined by assigning a range of values to each of the main RF parameters. To efficiently sample this multidimensional domain, a Latin Hypercube Sampling of 1000 samples has been performed and each sampled set of parameters has been tested via LOYO procedure to yield the score of the chosen metric. Finally, the set of parameters yielding the best score was retained and used for that specific country and indicator.

Table 2.10: Random Forest (RF) parameters involved in the optimization procedure, with a short description and range of possible values sampled by the Latin Hypercube Sampling algorithm.

RF parameter	Short description	Range
n_estimators	number of trees in the forest.	100-500
max_features	maximum number of features (predictors) considered for splitting a tree node.	0.1-1 (1 meaning all available features)
max_depth	maximum number of levels in each decision tree.	1-100
min_samples_split	minimum number of data points placed in a node before the node is split.	2-30
min_samples_leaf	minimum number of data points allowed in a leaf node (terminal node of a tree).	2-30
bootstrap	method for sampling data points (with or without replacement).	True/False

The parameters optimization has been tested with two different metrics: the Nash-Sutcliffe Efficiency (NSE), and a combined metric. In particular, the latter includes the normalized NSE (NNSE), which indicates a general goodness of fit to the observations, and the Normalized Mean Absolute Error (MAE) of Annual Maxima, which quantifies the ability of the model to reproduce high extremes of generation¹⁷

. However, this metric requires longer computational times and, in a few cases, brings unphysical results. Therefore, the proposed results are obtained with RF parameters optimized using NSE.

The NSE can be normalized so that its values will span from 0 (instead of -Inf) to 1:
\( NNSE = \dfrac{1}{2 - NSE} \)
The annual maxima (AM) of generation for both estimates and observations are extracted and compared. The goal is to minimize the error between the two, hence we can minimize the Normalized Mean Absolute Error (NMAE), which spans from 0 to 1, or else maximize: 1 - NMAE(AM). The product between the two will always result in a value comprised between 0 and 1, with 1 being the perfect score.

Model Validation Results

To summarize the validation results, a map displaying the NSE scores obtained for each country is visible in Figure 2.32 for the generation and inflow to reservoirs, the inflows to run-of-river, the inflows to pondage, and the inflows to open loops. Generally, over the PECD domain, the results are satisfactory, with fairly high NSE values for most countries. This is especially seen for the inflow to reservoirs indicator (panel b), which assimilates information on the reservoirs filling rates (for the countries that provide it) and hence is able to reduce the human influence on the generation signal, while generation signal without this information can be harder to reproduce with a model based on temperature and precipitation alone (see panel a). High scores are obtained also for inflows to run-of-river and pondage (panels c and d), where the signal has a more distinct seasonality and is less influenced by human intervention. The scores are generally lower for inflows to open-loop (panel e), largely based on PECDv3.1 data.

Low scores are mainly due to few years of available data for the training (e.g. 3 years), or to irregularities in the time series of generation which reduce its seasonality. This can be caused by artificial regulations or faulty data records. In some cases, low scores were obtained due to a loss of seasonality in the time series brought by significant changes in the Installed Capacity for that country / bidding zone. The new installations can cause abrupt changes or gradual shifts in the mean observed signal. Since the model is based solely on climate data, it cannot predict this behaviour. A possible solution that’s been attempted is to model directly Capacity Factors (CF), hence normalizing the provided generation data by the annual series of country-aggregated Installed Capacities (IC). This improves the results for some countries, but generally worsens them for countries where the IC doesn’t change significantly with time. This means that generation may not reflect the actual IC at one time. Changes to the IC can occur at the beginning of the reporting year or at any time during the year, therefore likely introducing step changes in the IC. However, a data collection was launched by ENTSO-E to retrieve monthly Installed Capacity time series from the TSOs and some were able to provide them. Therefore, where new installations visibly affected the TSO generation time series, these were normalized with the corresponding monthly IC data provided, the model was trained on the normalized time series, and the output was then multiplied back by the same IC series to re-obtain a generation/inflow time series. This procedure was applied to timeseries of Albania, Switzerland, Hungary, Poland, and Portugal and must be taken into account when comparing projection energy values to historical ones, since in these cases the anomalies are not only due to changes in climate variables, but also to the known changes in IC. It is also important to note that the assumption made for this procedure is that TSO generation and TSO monthly installed capacity series provided for these countries were compatible. Therefore, any inconsistency that may be found between model outputs and expected historical values may come from discrepancies between generation and installed capacity initial input data.

Other time series displayed irregularities arguably attributable to changes in IC but were not provided with monthly IC series. In such cases, the RF model was trained on a recent restricted time window (at least 4 years) of close-to-constant IC. The latter is hence assumed unvaried in time.

Figure 2.32: maps of the LOYO validation results obtained in terms of NSE over the period of available data which depends on the source (TSO: 2010-2022, TP: 2015-2022, PECDv3.1: 2010-2017). The four panels each refer to a different inflow (or generation) indicator, as reported in the panels’ titles.

Modelling Historical stream

Once the model is validated, it is trained (again for each country and indicator) on all available years of generation data (years between 2010 and 2022) using the tailored sets of parameters found during the optimization procedure. The same parameters are then used to extend the HP indicator back to 1979, to have long reconstructed time series, using the ERA5 temperature and precipitation data. Figure 2.33 shows an example of a historical time series of inflow to reservoirs as estimated by the RF model for France (in blue). It also shows the ‘observed’ inflow series in grey, estimated with TP data (see section Estimating Inflows).

Figure 2.33: RF-reconstructed time series of inflow to reservoirs (HRI) for France (FR). The estimated series is shown in blue, while the observations (2015-2022) in grey.

Estimating Inflows

The RF model produces generation timeseries, although artificial regulations can significantly impact the timeseries and affect its seasonality, jeopardizing the capability of Temperature and Precipitation to reliably reproduce said signal. This issue regards specific technologies involving a reservoir, especially Reservoir and Open-Loop pumped storage systems, while the effect can be in general neglected for run-of-river plants and pondage plants, which are run-of-river plants making use of a limited storage capacity amounting to no more than 24 hours.

In general, the hydrological balance of a reservoir or pumped storage facility over a given amount of time can be written as:

\[ \Delta S = IN - OUT \]

where \( \Delta S \) is the stored water volume of the reservoir, being the inflow to the reservoir, and being the outflow over time \( \Delta t \) The terms of the hydrological balance represent the net water flux and storage of our reservoir/pumping system. To express an energy balance, the water terms must be multiplied by the head of the pumps and of the hydropower plant, according to the equation expressing the relation between water discharge and power [MW]:

\[ P = \eta \cdot \gamma \cdot Q \cdot \Delta H \]

where \( \eta \) is the efficiency of the HP (conversely, the inverse efficiency of the pump for a pumped storage facility), \( \gamma \) is the specific weight of water [N/ m³], Q [m³/s] the volumetric discharge and \( \Delta H \) [m] the head jump. On the other hand, the energy term is expressed as the power spent or generated during the time period \( \Delta t \) :

\[ E = P \cdot \Delta t \]

The hydrological balance can then be expressed in terms of energy balance and used to compute the inflow and the other flux terms as energetic quantities. The inflow can be partitioned into a Natural component ( \( IN_{nat} \) ) and the energy pumped from the lower reservoir (in the case of a pumped storage system), which can be described as the consumed energy for pumping ( \( E_{pump} \) ) times an efficiency term ( \( \eta_{p} \) ). The outflow can be expressed as the actual production ( \( E_{out} \) ) divided by an efficiency term for production ( \( \eta_{o} \) ).

\[ \Delta S = IN_{nat} + \eta_{p} \cdot E_{pump} - \dfrac{E_{out}}{\eta_{o}} \]

When Stored Energy (S), actual generation output ( \( E_{out} \) ) dispatched to the grid and (if needed) consumed grid energy for pumping ( \( E_{pump} \) ) time series are available, the potential natural inflow to the system can be estimated assuming an efficiency for pumping and production.

An estimate of the efficiency can be found considering a closed-loop pumping system, which is not continuously connected to a river system: its natural inflow component can be considered null. In addition, for sufficiently large time intervals \( \Delta t \) , we can assume the storage term to be negligible compared to the other terms and hence write:

\[ \eta_{p} \cdot E_{pump} = \dfrac{E_{out}}{\eta_{o}} \]

This allows to estimate the round-trip efficiency of the system:

\[ \eta = \eta_{p} \cdot \eta_{o} = \dfrac{E_{out}}{E_{pump}} \]

Figure 2.34: An approximated sketch of a closed-loop system.

Figure 2.35: Cumulated energy production (blue), pumping energy consumption (red), and estimated natural inflow (green) for a French Closed-Loop unit.

This roundtrip efficiency usually depends on the design of the plant. For older designs it may be lower than 60%, while for recent ones it can be up to 90%. The suggested efficiency from ENTSO-E is 0.75, so we’ll assume this to be the reference value over Europe. As seen in Figure 2.35 for a French Closed-Loop unit, the balance holds as the production and pumping terms are cumulated over time and the natural inflow remains null.

Inflow to Open-loop Pumping

The situation for open-loop facilities is different since the natural inflow component isn’t null, and therefore constitutes a third unknown, together with the two efficiencies. The assumption that one can make is to consider the pumping and production efficiencies as equal ( \( \eta_{o} = \eta_{p} \) ) and comparable to those of closed-loop plants ( \( \eta_{p} \cdot \eta_{p} = 0.75 \) ), and hence have:

\[ \eta_{p} \simeq 0.866 \] \[ IN_{nat} = \Delta S + \dfrac{E_{out}}{\eta_{p}} - \eta_{p} \cdot E_{pump} \]

Figure 2.36: An approximated sketch of an Open-loop system.

Inflow to Reservoirs

As for reservoirs, the pumping component is null, so the equation reduces to:

\[ IN_{nat} = \Delta S + \dfrac{E_{out}}{\eta_{p}} \]

In principle, there can also be another inflow component, which is released at the outlet of the dam. However, as this term is not known it is disregarded when computing the energy balance. Nonetheless, the approximated formula can cause the estimated inflow to reach negative values in times of water scarcity. Following the recommendation by ENTSO-E, the negative values are set to zero in the final modelled timeseries.

It must be noted that TP data has been cautiously used to compute inflow to reservoirs, since the stored energy data on the platform refers both to reservoirs and pumped storage technologies. Hence, the inflow results from the TP have been retained only in a few cases, generally where the reported installed capacity for reservoirs is much greater than the one for pumped storage.

Inflow to Run-of-rivers and Pondage

For Run-of-river systems, the storage term is considered null, and considering that the storage capacity of a pondage is less than 24 hours, the same is assumed for run-of-river with pondage at weekly resolution, hence reducing the equation to:

\[ IN_{nat} = \dfrac{E_{out}}{\eta_{p}} \]

When possible, the two technologies are kept separate. For instance, this is possible for the bidding zones whose TSO provided distinct generation time series. Data from the TP, on the other hand, are used to model run-of-river technology only in case no pondage was declared for that bidding zone by the TSO, nor was pondage available in the PECDv3.1 dataset. This to make sure that the sole run-of-river was being addressed, given the TP generation data includes both technologies (addressed as ‘Run-of-river and pondage’). If only run-of-river data were provided by the TSO for a given bidding zone, the run-of-river inflow was calculated starting from this data, while the pondage inflow was calculated starting from the PECDv3.1 data. Comments on these particular cases are left in the Summary Table (Table 2.12).

Finally, the same production efficiency is assumed for all technologies ( \( \eta_{p} \simeq 0.866 \) ), however, to align with the models used by ENTSO-E to ingest the energy data, the final inflow model outputs are multiplied back by the same efficiency coefficient to obtain an inflow at the electrical grid level. Although the balance equations should bring to close-to-reality estimates, it must be noted that not having access to actual inflow observations, it is not possible to fully validate the above methodology.

Use of PECDv3.1 inflow estimates

In case TSO and TP data were not sufficient to complete the inflow for a specific bidding zone and a specific technology, the PECDv3.1 inflow data were used directly as the target variable for the training of the RF model as indicated in Figure 2.37. This approach was especially used to model inflows to open-loop pumped storage as only a few stored energy time series were provided by the TSOs. Therefore, there are cases in which the generation is modelled starting from available TSO data, while the corresponding inflow (for the unavailability of stored energy data) is modelled starting from PECDv3.1 data, bringing up sometimes inconsistencies between the two datasets. The main ones are reported in the Summary Table (Table 2.12).

Figure 2.37: Sketch of the two different approaches to model inflows: approach 1 makes use of TSO and TP data, approach 2 makes use of PECDv3.1 data.

Post-hoc corrections following TSOs’ feedback

For the produced inflow datasets of some specific technologies and regions, a multiplicative correction factor was applied to the model outputs in agreement with the TSO of interest after validation against a reference dataset. These correction factors were hence required due to the poor quality of the public data initially used for the model training and are to be regarded as temporary adjustments ahead of a more stable solution. See Table 2.11 for an overview of the explicit multiplicative values, and the regions to which these were applied for the PECDv4.1 delivery of data.

Table 2.11: Multiplicative Correction Factors applied to inflows model output.

Region	Technology	Correction Factor	Source
AT00	HRI – inflows to reservoirs	2404/5507	Comparison of mean maximum generation with an internal APG data source with strict sharing limitations.
	HRR – inflows to run of river	23082/17760
	HPI – inflows to pondage	5607/4506
CH00	HOL – inflows to open-loop pumped storage	0.825	Comparison of mean annual cumulated inflows with a reference monthly dataset derived from Swiss Federal Office of Energy (SFOE) data.
CH00	HRR – inflows to run of river	1.39	Comparison of mean annual cumulated inflows with a reference monthly dataset (SFOE). Mind: this factor was applied directly to the model input TSO data in accordance with the Swiss TSO.
TR00	HRR – inflows to run of river (and relative IC series)	2.502	Comparison of mean annual cumulated inflow with an internal series of annual cumulated generation for period 2019-2023 including all country plants.
TR00	HRI – inflows to reservoirs (and relative IC series)	1.850	Comparison of mean annual cumulated inflow with an internal series of annual cumulated generation for period 2019-2023 including all country plants.

Summary Table

The following table (Table 2.12) includes all addressed bidding zones and technologies (except for generation from run-of-river and pondage which would be a repetition of the respective reported inflow columns) and can be used to check the availability of data, source of data used for the modelling, and comments on the results mainly addressing inconsistencies found or considerations made for the source/modelling choices. As mentioned, the TSO generation data have always been given priority when available, followed by TP data and PECDv3.1 estimates. Given the different data sources and methodology used, the results can significantly differ from the ones of the previous PECD, therefore we strongly recommend checking with TSOs about the reliability of mean generation/inflow historical values.

Table 2.12: Summary table of used data sources and comments/considerations on the model outputs results.

	Reservoirs Generation	Inflow to Reservoirs	run-of-river Inflow	Inflow to Open Loop PS	Pondage Inflow
Bidding zone / Tech.	HRG	HRI	HRO	HOL	HPO
AL00	TSO rescaled using monthly IC	TSO rescaled using monthly IC	TSO rescaled using monthly IC
AT00	TSO	TP – the mean using PECDv3.1 data is too low with respect to TSO data, hence using TP data although SE is surely affected by HPS (Hydro Pumped Storage)	TSO	PECDv3.1	TSO
BA00	TSO	PECDv3.1	PECDv3.1- TSO run-of-river data not provided – might be already accounted for in TSO pondage data	PECDv3.1	TSO
BE00			TSO
BG00	TSO	TSO	TSO	TSO
CH00	TSO	TSO – rescaled using monthly IC	TSO - rescaled using monthly IC – multiplication factor of 1.39 applied to generation input data in accordance with CH00 TSO	TSO - rescaled using monthly IC
CZ00	TSO	PECDv3.1	TP (since there’s no pondage) – can reproduce mean signal, can’t well reproduce the peaks – suspected anthropic factors influencing the production after 2019	PECDv3.1
DE00	TSO	PECDv3.1 – mean too low with respect to TSO generation, should be ca three times higher	TSO	PECDv3.1
ES00	TSO	TSO	TSO	TSO
FI00	TSO	TSO	TP (no TSO pondage data, no PECDv3.1 pondage data)
FR00	TP	TP – HPS (pumped storage) IC about 60% of HRE (reservoirs) IC in past 8 years (from TP data) + time series very close to PECDv3.1 inflow	TP (no TSO data for FR, no pondage in PECDv3.1 data)	GPU (Generation Per Unit) - (no PECDv3.1 data for FR) - low reliability: no HOL storage energy available (approximated inflow assuming negligible storage from one week to the other) + few production and pumping data (3 years)
GR00	TSO	TSO	TSO – model training on last 4 years (missing monthly IC data to rescale) – significant difference with PECDv3.1 inflow	TSO	PECDv3.1 – even though no pondage data from TSO nor TP
HR00	TSO – very close to TP generation	TP – HPS IC about 20% of HRE IC in the past 9 years (TP data)	TSO – could contain pondage	PECDv3.1	PECDv3.1 – even though no pondage data from TSO.
HU00			TSO rescaled using monthly IC
IE00			TSO
ITCA	TSO	PECDv3.1 – reasonable values with respect to TSO generation	TSO
ITCN	TSO	PECDv3.1 – inflow sometimes lower than TSO generation	TSO
ITCS	TSO	PECDv3.1 – inflow very close to TSO generation	TSO	PECDv3.1
ITN1	TSO	PECDv3.1 – inflow very close to TSO generation	TSO	PECDv3.1
ITS1	TSO	PECDv3.1 – inflow close to generation (would expect it a bit higher)
ITSA	TSO	PECDv3.1 – high with respect to TSO generation	TSO
ITSI	TSO	PECDv3.1 – low peaks with respect to TSO generation	TSO	PECDv3.1
LT00			TSO – generation values exceptionally high for the year 2015 (something wrong in the data) -> left out of training
LV00					TSO
LU00			TSO
ME00	TSO – close to tp generation data, higher peaks	TP – no HPS IC	PECDv3.1
MK00	TSO	TSO
NL00			PECDv3.1
NOM1	TSO	TP – small HPS production compared to HRE	TSO	PECDv3.1
NON1	TSO	TP - no HPS	TSO
NOS1	TSO	TP – no HPS	TSO	-
NOS2	TSO	TP – trying splitting PECDv3.1 NOS0 data obtained similar result + small HPS production	TSO	PECDv3.1 (splitting PECDv3.1 NOS0 data according to mean TSO generation data for NOS2)
NOS3	TSO	TP - trying splitting PECDv3.1 NOS0 data obtained similar result + small HPS production	TSO	PECDv3.1 (splitting PECDv3.1 NOS0 data according to mean TSO generation data for NOS3)
PL00	TSO	PECDv3.1 – mean inflow value is 3-4 times higher than TSO generation (also TP-calculated mean is 3-4 times higher)	TSO - rescaled using monthly IC	PECDv3.1 – inflow seems to be too low considering TSO generation and pumping series: ca 200 MWh of inflow against 1200 MWh of generation (mean weekly values)
PT00	TSO	TSO	TSO – values seem low, tp and PECDv3.1 data ca 10 times higher than TSO data of run-of-river and HPO together	TSO - rescaled using monthly IC	TSO
RO00	TSO	PECDv3.1	TSO	PECDv3.1
RS00	TSO	PECDv3.1 – TP data significantly impacted by HPS	TSO
SE01	TSO	PECDv3.1
SE02	TSO	PECDv3.1
SE03	TSO	PECDv3.1
SE04	TSO	PECDv3.1
SI00	TSO	-	TSO – could contain pondage		PECDv3.1 – no pondage generation data from TSO: keeping PECDv3.1 trained estimates. Pondage could be included in run-of-river TSO data? In this case PECDv3.1 estimates are off.
SK00	TSO	PECDv3.1 – although mean is considerably higher than TSO generation	TSO	PECDv3.1	TSO
TR00
UK00			TP – (no TSO data for GB, no pondage in PECDv3.1 data)

Energy indicators

Energy indicators included in the PECDv4.1 dataset for the historical stream are described in Table 2.13. This table provides information for each variable, including the typology, the time period covered, the source of the input data, the domain, temporal resolution, spatial aggregation (as specified in Table 2.1), and, where applicable, the different technologies used to compute the final time series.

For onshore wind power capacity factors, ten time series are computed: one for existing technologies and nine for future technologies, based on different combinations of Specific Power and Hub Height. Similarly, for offshore wind power capacity factors, three time series are computed: one for existing technologies and two for future technologies, also based on different combinations of Specific Power and Hub Height. For Concentrated Solar Photovoltaic capacity factors, four time series are computed, considering combinations of preDispatch and Dispatch technologies with 0 hours and 7 hours of storage.

Table 2.13: Energy indicators provided in the PECDv4.1 for the historical stream. Files provided at ORIG spatial aggregation are gridded (NetCDF format), while all the other levels of aggregation are provided in CSV format.

Variable	Type	Time period	Source	Domain/ spatial resolution	Temporal resolution	Spatial aggregation	Technology	Units
Wind power onshore (WON)	Capacity factor	1980 - 2021	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	PEON	Onshore Existing technologies, Onshore SP199_HH100, Onshore SP199_HH150, Onshore SP199_HH200, Onshore SP277_HH100, Onshore SP277_HH150, Onshore SP277_HH200, Onshore SP335_HH100, Onshore SP335_HH150, Onshore SP335_HH200	MW/MW
Wind power offshore (WOF)	Capacity factor	1980 - 2021	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	PEOF	Offshore Existing technologies, Offshore SP316_HH155, Offshore SP370_HH155	MW/MW
Solar generation (SPV)	Capacity factor	1980 - 2021	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, PEON	---	MW/MWp
Concentrated solar generation (CSP)	Capacity factor	1980 - 2021	ERA5 reanalysis	PECD/0.25° x 0.25°	hourly	PEON	storage_0_hours_preDispatch, storage_0_hours_storageDispatched, storage_7p5_hours_preDispatch, storage_7p5_hours_storageDispatched	MW/MW
Hydropower reservoirs generation energy (HRG)	Energy	1980 - 2021	ERA5 reanalysis ENTSO-E TP* TSO**	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower reservoirs inflow energy (HRI)	Energy	1980 - 2021	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river generation energy (HRO)	Energy	1980 - 2021	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river inflow energy (HRR)	Energy	1980 - 2021	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river with pondage generation energy (HPO)	Energy	1980 - 2021	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river with pondage inflow energy (HPI)	Energy	1980 - 2021	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower open-loop pumped storage inflow energy (HOL)	Energy	1980 - 2021	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh

*Energy data from ENTSO-E Transparency Platform

**Energy data from Transmission System Operators specific for each country

***Inflow data from ENTSO-E PECDv3.1

Known issues

There are no known issues.

Projection stream

Projection models

Choice of models

The current dataset has been designed and implemented to produce climate and energy indicators for the PECD domain, covering the projection temporal horizon (stream) up to timescales of the far future (currently year 2065). To design and deliver the most meaningful set of climate and energy indicators for this purpose, a preliminary step involved the selection of the most suitable subset of climate projections for the energy sector.

This analysis utilized CMIP6 models, part of the Coupled Model Intercomparison Project, a global collaboration of climate modelers organized under the World Climate Research Program (WCRP) of the World Meteorological Organization (WMO). CMIP is now in its sixth phase (Eyring et al., 2016) and results from CMIP6 have been assessed in the IPCC (Intergovernmental Panel on Climate Change) AR6 (Sixth Assessment Report) reports¹⁸ . For PECDv4.1, CMIP6 models were preferred to EURO-CORDEX¹⁹ model output since CMIP data represent a more comprehensive dataset regarding the number of projections for the future. The main drawback of the CMIP6 models is their spatial horizontal resolution, which is typically 100 km. To obtain the wanted spatial resolution for PECDv4.1, a statistical downscaling with a bias adjustment procedure is implemented to increase the resolution to 0.25° (same as ERA5).

Currently, more than 40 CMIP6 models are available. It is important to note that the variables and their characteristics stored in public repositories, such as the Earth System Grid Federation (ESGF) nodes or the CDS, vary across these models. Therefore, sub-selection criteria were implemented to choose a subset of models that: i) store the variables required for PECDv4.1 at the desired spatial and temporal resolution, and ii) are representative of far-future climate conditions, particularly regarding the energy sector.

A selection based on the Equilibrium Climate Sensitivity (ECS) of the models was chosen over more sophisticated methods that assess model performance in reproducing specific climate features, which is a common practice for regional model selection. This approach aimed to identify models that cover the observed range of climate sensitivity as estimated in the IPCC AR6, including one or two models with higher sensitivity to represent "low likelihood, high impact" scenarios, while also ensuring the models were as independent as possible in terms of their components. The results are presented in Table 3.1.

Table 3.1: CMIP6 climate models. Models are categorized as follows: models that do not provide all scenarios (highlighted in dark red), models with ECS outside the observed climate sensitivity range estimated in AR6 (highlighted in orange), and models that share components with other models (highlighted in yellow). The selection is made from the non-highlighted models.

	Model	ECS (°C)	Pre-industrial	Historical	SSP1-1.9	SSP1-2.6	SSP2-4.5	SSP3-7.0	SSP5-8.5
1	ACCESS-CM2	4.72	500	3		1	1	1	3
2	ACCESS-ESM1-5	3.87	900	10		3	3	3	10
3	AWI-CM-1-1-MR	3.16	500	5		1	1	5	1
4	BCC-CSM2-MR	3.04	600	3		1	1	1	1
	BCC-ESM1	3.26		3
5	CAMS-CSM1-0	2.29	500	2	x	2	2	2	2
6	CanESM5	5.62	1000	40	x	40	40	40	40
7	CanESM5-CanOE		501	3		3	3	3	3
8	CESM2	5.16	1200	11		3	3	3	3
9	CESM2-FV2	5.14	500	3		XXXX	XXXX	XXXX	XXXX
10	CESM2-WACCM	4.75	499	3		1	3	1	3
11	CESM2-WACCM-FV2	4.79	500	3		XXXX	XXXX	XXXX	XXXX
12	CMCC-CM2-SR5	3.52	500	1		1	1	1	1
	CMCC-ESM2		500	1		1	1	1	1
13	CNRM-CM6-1	4.83	500	30		6	6	6	6
	CNRM-CM6-1-HR	4.28	XXXX	1		1	1	1	1
14	CNRM-ESM2-1	4.76	500	9	x	5	5	5	5
15	E3SM-1-0	5.32	500	5		XXXX	XXXX	XXXX	XXXX
	E3SM-1-1-ECA		XXXX	1		XXXX	XXXX	XXXX	XXXX
	E3SM-1-1		XXXX	1		XXXX	XXXX	XXXX	1
	EC-Earth3	4.10	XXXX	23	x	7	22	7	7
16	EC-Earth3-Veg	4.31	500	5	x	4	5	4	4
	EC-Earth3-Veg-LR		XXXX	1	x	XXXX	XXXX	XXXX	1
17	FGOALS-f3-L	3.00	561	3		1	1	1	1
18	FGOALS-g3	2.88	700	6	x	1	1	1	4
	FIO-ESM-2-0		XXXX	3		3	3	XXXX	3
19	GFDL-CM4	3.89	500	1		XXXX	1	XXXX	1
20	GFDL-ESM4	2.60	500	3	x	1	3	1	1
21	GISS-E2-1-G	2.72	851	39	x	2	15	2	7
	GISS-E2-1-G-CC		XXXX	1		XXXX	XXXX	XXXX	XXXX
	GISS-E2-1-H	3.11	XXXX	1	x	XXXX	XXXX	XXXX	XXXX
22	HadGEM3-GC31-LL	5.55	500	4		1	1	XXXX	3
23	HadGEM3-GC31-MM	5.42	500	2		1	XXXX	XXXX	3
24	INM-CM4-8	1.83	531	1		1	1	1	1
25	INM-CM5-0	1.92	1201	10		1	1	5	1
26	IPSL-CM6A-LR	4.56	1200	32	x	6	11	11	6
	KACE-1-0-G	4.48	XXXX	3		2	3	3	3
27	MCM-UA-1-0	3.65	500	2		1	1	1	1
28	MIROC-ES2L	2.68	500	10	x	2	1	1	10
29	MIROC6	2.61	800	10	x	3	3	3	3
30	MPI-ESM-1-2-HAM	2.96	780	2		XXXX	XXXX	XXXX	XXXX
31	MPI-ESM1-2-HR	2.98	500	10		2	2	10	2
32	MPI-ESM1-2-LR	3.00	1000	10		10	10	10	10
33	MRI-ESM2-0	3.15	701	5	x	1	1	5	2
34	NESM3	4.72	500	5		2	2	XXXX	2
35	NorCPM1	3.05	500	30		XXXX	XXXX	XXXX	XXXX
	NorESM2-LM	2.54	XXXX	3		1	3	1	1
36	NorESM2-MM	2.50	500	1		1	1	1	1
37	SAM0-UNICON	3.72	700	1		XXXX	XXXX	XXXX	XXXX
38	TaiESM1	4.31	500	1		1	1	1	1
39	UKESM1-0-LL	5.34	1100	17	x	5	5	5	5

This list was further refined by considering additional criteria, specifically the availability of sufficient temporal resolution, with a minimum requirement of 3-hourly data, and horizontal spatial resolution, with a minimum requirement of 100 km. These criteria ensure the data is sufficiently detailed for further processing and analysis.

Concerning the Shared Socio-economic Pathways - SSPs (IPCC, 2021)²⁰ , the analysis involved the scenario SSP2-4.5. This scenario explores possible climate futures with one level of greenhouse gas emissions: intermediate emissions (SSP245). In the next release, both the number of projection models and scenarios considered will be extended, as will the temporal coverage (namely from 2065 to 2100).

The final selection of models is reported in Table 3.2 (with details about native resolutions).

Table 3.2: CMIP6 models considered for PECDv4.1 under the projections stream.

Model	Time resolution	Spatial resolution	Simulations	Variant label	Calendar
CMCC-CM2-SR5	3 hours	100 km	historical, ssp245	r1i1p1f1	365_day
EC-Earth3	3 hours	100 km	historical, ssp245	r1i1p1f1	proleptic_gregorian
MPI-ESM1-2-HR	3 hours	100 km	historical, ssp245	r1i1p1f1	proleptic_gregorian

Note that the historical simulation period is chosen to ensure overlap between ERA5 and the CMIP6 models, enabling the computation of bias adjustment.

Data retrieval

CMIP6 variables (for each model) are downloaded from the ESGF node using a Python script that utilizes a specific Python API. The script only accepts a configuration file as an argument, which contains the desired tags for the download. This script is used for downloading both historical and scenario data. Table 3.3 lists the nodes from which each model has been downloaded.

Table 3.3: CMIP6 models used in the projections stream and their corresponding nodes for downloading.

Model	Originator	Model code	node URL
CMCC-CM2-SR5	CMCC (Centro Euro-Mediterraneo sui Cambiamenti Climatici)	CMR5	https://esgf-data.dkrz.de/esg-search
EC-Earth3	ECEC (European community Earth System Model)	ECE3	https://esg-dn1.nsc.liu.se/esg-search
MPI-ESM1-2-HR	MPI- (Max Planck Institute)	MEHR	https://esgf-data.dkrz.de/esg-search

Spatial interpolation

Starting from a common 100 km nominal spatial resolution and global domain, each model has its own grid, necessitating spatial interpolation to the PECD domain at 0.25° x 0.25°. This interpolation uses the bilinear method as implemented in the CDO²¹ remapbil command line tool.

The interpolation process involves the following command:

\[ cdo \hspace{2mm} remapbil,<grid.txt> \hspace{2mm} <filein.nc> \hspace{2mm} <fileout.nc> \]

where grid.txt is a text file obtained by running: \[ cdo \hspace{2mm} griddes \hspace{2mm} <template.nc> \hspace{2mm} > \hspace{2mm} <grid.txt> \] on a template file with the desired grid. Here, filein.nc is the original projection file, and fileout.nc is the spatially interpolated output.

A Python script iterates over the files and, using the os library, calls the CDO command line for each file. Another Python script in the preprocessing pipeline checks the output files for NaNs and anomalous values, and reformats them according to ERA5 conventions.

https://code.mpimet.mpg.de/projects/cdo

Temporal aggregation and interpolation

As stated in Section 3.1, one of the selection criteria for projection models is the finest available temporal resolution (3 hours). However, it is necessary to apply temporal interpolation to achieve the required hourly resolution for the PECDv4.1 database. Table 3.4 shows the method used to temporally interpolate each variable.

Table 3.4: Temporal interpolation methodologies.

Variable	Interpolation method
Temperature (TA)	(1) Cubic spline with moving window (window width 3 days)
Precipitation (TP)	(3) Cumulating over the days
Solar Radiation at Surface (GHI)	(2) Method ad hoc for taking into account the position of the sun
Wind speed at 10 m (WS10)	(1) Cubic spline with moving window (window width 3 days) apply separately at the 10 m horizontal components of wind)

The cubic spline interpolation is implemented in a Python script that uses the xarray library. The set of files is opened in an xarray.mfdataset (multi-file dataset), and an iterator runs along the "time" coordinates of the 3-hourly file on a daily step starting from 00:00 hours. In each step, a window with a width of 3 days is created, and the data within the window are interpolated to an hourly resolution for each grid point by combining the xarray methods resample("1h") and interpolate("cubic"). The interpolated data for the central day (from 00:00 to 23:00) are then stored in a new dataset and saved as a NetCDF file.

It is important to note that to obtain files according to the ERA5 conventions and to have the first hour as 00:00 for the projections, it is necessary to use the last day of the historical scenario, considering that the different SSP scenarios start from 03:00. Figure 3.1 contains a validation of this method considering the TA variable at a generic point of the PECD domain.

Figure 3.1: Validation of the temporal interpolation procedure based on spline cubic algorithm for TA.

For surface solar radiation downwards, the first step involves converting the three-hourly data to a clearness index (Kt), removing its dependence on variations in the sun's apparent position. This is done by dividing the GHI by its equivalent irradiation at the top of the atmosphere (TOA), which is calculated using an algorithm from the Solar Geometry 2 (SG2) library:²² The SG2 library can be installed via "pip" in any Python environment. The detrended Kt time series is then downscaled to an hourly resolution using linear interpolation. The data is subsequently reconverted to GHI by multiplying it with an hourly-averaged TOA value. Figure 3.2 shows a validation plot for this procedure, computed at a generic point within the PECD domain.

Figure 3.2: Validation of the temporal interpolation procedure based on the SG2 algorithm for GHI. TOA stands for top-of-the-atmosphere solar radiation, and SRDS stands for incoming solar radiation and is equivalent to GHI.

The required variable for precipitation is total precipitation (TP), which has been derived from the precipitation flux (in kg m⁻² s⁻¹), the original data format for CMIP6 projection models. Since energy models require daily cumulative data, the downloaded precipitation flux data was first resampled to daily averages using the xarray.resample().mean() method. This daily average was then multiplied by 86.4 to convert the data into daily precipitation in meters.

https://github.com/gschwind/sg2

Bias-adjustment procedure

Concerning the projections streamflow, two bias adjustment methodologies have been implemented for the CMIP6 projection datasets. These methodologies are:

Cumulative Distribution Function Transform (CDFt): As described by Michelangeli et al. (2009), this method assumes the existence of a transformation that can 'translate' a time series of a CMIP6 variable (the predictor) into the CDF representing the reference climate variable (the predictand) at a given point.
Delta Adjustment: As detailed by Navarro-Racines et al. (2020), this method applies a simple constant correction based on the averages of the predictor and the predictand.

The calibration period for these methodologies extends from 1995 to 2014 (20 years) for the CDFt method. The calibration time series are retrieved from the ERA5 dataset and the historical scenario of the CMIP6 projection model dataset (reference and source variable to be corrected, respectively).

CDFt Method: This method is used for variables with a strong climate-change-related trend, such as temperature. To correctly account for the trend, a 20-year time series is considered for the calculation of the CDFs, with only the central 10-year window taken as the adjusted data. The 20-year timeframe is then moved forward, yielding a new 10-year central window that partially overlaps the window of the previous step. This procedure is referred to as ‘moving time window adjustment’. The CDFt method is also used for the bias adjustment of total precipitation and 10m wind speed (without moving windows).

Delta Adjustment: This computationally less expensive method is used for variables that do not exhibit a strong climate-change-related trend, such as solar radiation (GHI). Despite wind speed and precipitation (WS10 and TP) not exhibiting a strong climate change trend, their correction is also based on the CDFt method. This is because the mean factors in the Delta method could potentially lead to negative (and therefore unphysical) values. For these variables, given the lack of a strong climatic trend, the CDFt considers a ‘static’ 20-year time series.

Climate indicators

Table 3.5 lists the climate indicators for the projection stream. The final domain and spatial resolution, as well as the final temporal resolution, are obtained through preprocessing as described in Section 3.3 and Section 3.4, respectively. The bias adjustment has been applied using the procedures detailed in Section 3.5. The computation of TAW, wind speed at 10 m and at 100 m, and spatial aggregation follows the same methods described for the historical stream (see Sections 2.4, 2.1, 2.3, and 2.5, respectively). It is important to note that all variables are bias-adjusted except for TAW and WS100, because the former is derived from a TA bias adjustment and the latter from a WS10 bias adjustment.

Table 3.5: Climate indicators provided in the PECDv4.1 for the projection stream. Files provided at BIAS spatial aggregation are gridded (NetCDF format), while all the other levels of aggregation are provided in CSV format.

Variable	Period	Source	Models	Scenario	Domain/ spatial resolution	Temporal resolution	Spatial aggregation	Units
2m temperature (TA)	2015-2065	CMIP6 projections	CMR5, ECE3, MEHR	SP245, SP370	PECD/0.25° x 0.25°	hourly	BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	K (gridded) °C (aggregated)
Population-weighted temperature (TAW)	2015-2065	CMIP6 projections	CMR5, ECE3, MEHR	SP245, SP370	PECD/0.25° x 0.25°	hourly	SZON	°C
Total precipitation (TP)	2015-2065	CMIP6 projections	CMR5, ECE3, MEHR	SP245, SP370	PECD/0.25° x 0.25°	hourly	BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m
Surface solar radiation downwards (GHI)	2015-2065	CMIP6 projections	CMR5, ECE3, MEHR	SP245, SP370	PECD/0.25° x 0.25°	hourly	BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	W m^-2
10m wind speed (WS10)	2015-2065	CMIP6 projections	CMR5, ECE3, MEHR	SP245, SP370	PECD/0.25° x 0.25°	hourly	BIAS, BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m s^-1
100m wind speed (WS100)	2015-2065	CMIP6 projections	CMR5, ECE3, MEHR	SP245, SP370	PECD/0.25° x 0.25°	hourly	BIAS, NUT0, NUT2, SZON, SZOF, PEON, PEOF	m s^-1

Energy data

The same data illustrated in Section 2.7 are also used for the projection stream.

Energy Conversion models

Wind Power conversion model

The climate data used as input are listed in Table 3.4, and the procedure is the same as described in Section 2.9.1.

The simulated locations and wind technologies depend on the type of run. An overview of the runs is given in Table 3.6.

Table 3.6: Wind run types for projection stream.

Run type

Climate projection simulated years

WPP locations

WPP technology

Losses

Existing

2015-2065

All years with 2020 WPP locations (based on WindPowerNet data)

Existing WPP parameters based on WindPowerNet data (always 2020 fleet), applied in the generic power curve model

Wakes as part of the generic power curve. And 10 % for other losses (incl. unavailability), applied as a simple multiplication by 0.9

Future wind technologies

2015-2065

The best 10-50 % locations of the unmasked points within each PECD region (in terms of mean wind speed in the bias-adjusted ERA5 data, based on ERA5 grid).

Onshore wind: 3 hub heights and 3 turbine types, so in total 9 wind technologies. A plant of 50 MW with ten 5 MW turbines modelled for each technology.

Offshore wind: 1 hub height and 2 turbine types, so in total 2 wind technologies. A plant of 500 MW with 28 18 MW turbines modelled for each technology.

Wakes as part of power curves. And 5 % for other losses (incl. unavailability), applied as a simple multiplication by 0.95

Photovoltaic Solar Power conversion model

The climate data used as input are listed in Table 3.4, and the procedure is the same as described in Section 2.9.2.

Concentrated Solar Power conversion model

The climate data used as input are listed in Table 3.4, and the procedure is the same as described in Section 2.9.3.

Hydro Power conversion model

The climate data used as input are listed in Table 3.4, and the procedure is the same as described in Section 2.9.4.

Energy indicators

The energy indicators are the same as described in Section 2.10 for the historical stream computed starting from the climate indicators listed in Table 3.5.

In Table 3.7, the energy variables contained in this database are summarized. Table 3.7 provides detailed information for each variable, including the type, the time period covered, the source of the input data, the domain, the temporal resolution, the spatial aggregation (according to Table 2.1), and, where applicable, the different technologies used to compute the final time series.

Table 3.7: Energy indicators provided in the PECDv4.1 for the projection stream. Files provided at ORIG spatial aggregation are gridded (NetCDF format), while all the other levels of aggregation are provided in CSV format.

Variable	Type	Period	Source	Domain/ spatial resolution	Temporal resolution	Spatial aggregation	Technology	Units
Wind power onshore (WON)	Capacity factor	2015 - 2065	CMIP6 projection	PECD/0.25° x 0.25°	hourly	PEON	Onshore Existing technologies, Onshore SP199_HH100, Onshore SP199_HH150, Onshore SP199_HH200, Onshore SP277_HH100, Onshore SP277_HH150, Onshore SP277_HH200, Onshore SP335_HH100, Onshore SP335_HH150, Onshore SP335_HH200	MW/MW
Wind power offshore (WOF)	Capacity factor	2015 - 2065	CMIP6 projection	PECD/0.25° x 0.25°	hourly	PEOF	Offshore Existing technologies, Offshore SP316_HH155, Offshore SP370_HH155	MW/MW
Solar generation (SPV)	Capacity factor	2015 - 2065	CMIP6 projection	PECD/0.25° x 0.25°	hourly	ORIG, NUT0, NUT2, SZON, PEON	---	MW/MWp
Concentrated solar generation (CSP)	Capacity factor	2015 - 2065	CMIP6 projection	PECD/0.25° x 0.25°	hourly	PEON	storage_0_hours_preDispatch, storage_0_hours_storageDispatched, storage_7p5_hours_preDispatch, storage_7p5_hours_storageDispatched	MW/MW
Hydropower reservoirs generation energy (HRG)	Energy	2015 - 2065	CMIP6 projection ENTSO-E TP* TSO**	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower reservoirs inflow energy (HRI)	Energy	2015 - 2065	CMIP6 projection ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river generation energy (HRO)	Energy	2015 - 2065	CMIP6 projection ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river inflow energy (HRR)	Energy	2015 - 2065	ERA5 reanalysis ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river with pondage generation energy (HPO)	Energy	2015 - 2065	CMIP6 projection ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower run-of-river with pondage inflow energy (HPI)	Energy	2015 - 2065	CMIP6 projection ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh
Hydropower open-loop pumped storage inflow energy (HOL)	Energy	2015 - 2065	CMIP6 projection ENTSO-E TP* TSO PECDv3.1*	PECD/0.25° x 0.25°	weekly	SZON	---	MWh

*Energy data from ENTSO-E Transparency Platform

**Energy data from Transmission System Operators specific for each country

**Inflow data from ENTSO-E PECDv3.1

Appendix

Filenames convention

This paragraph aims to explain the filename convention of the PECD datasets. Table 4.1 details the structure and possible fields of the filenames. Specifically, the last column indicates the corresponding section of the CDS catalogue where users can personalize their choice. If "Not applicable" is indicated, it means that the user cannot modify this field, and the data are downloaded with fixed characteristics that are not customizable.

Table 4.1: Filename convention used in the PECDv4.1.

Position in the filename	Possible substrings for each position in the filename	Description	Option in the CDS download form
0	H (historical), P (projection)	Data streams	Stream
1	ERA5 (ERA5 reanalysis), CMI6 (CMIP6 Projection)	Model	Origin (Reanalysis or Climate models)
2	ECMW (ECMWF), CMCC (Centro Euro-Mediterraneo sui Cambiamenti Climatici), ECEC (European community Earth System Model), MPI- (Max Planck Institute)	Model	Origin (Reanalysis or Climate models)
3	T639 (ERA5 data), CMR5 (CMCC-CM2-SR5 r1i1p1f1), ECE3 (EC-Earth3 r1i1p1f1), MEHR (MPI-ESM1-2-HR r1i1p1f1)	Model	Origin (Reanalysis or Climate models)
4	TA- (2m temperature), TAW (Population-weighted temperature), TP- (Total precipitation), GHI (Surface solar radiation downwards), WS- (10m wind speed and 100m wind speed)	Variable	Variable (Climate)
4	SPV (Solar generation capacity factor), CSP (Concentrated solar generation capacity factor), WON (Wind power onshores capacity factor), WOF (Wind power offshores capacity factor), HOL (Hydropower open-loop pumped storage inflow energy), HPI (Hydropower run-of-river with pondage inflow energy), HPO (Hydropower run-of-river with pondage generation energy), HRG (Hydropower reservoirs generation energy), HRI (Hydropower reservoirs inflow energy), HRO (Hydropower run-of-river generation energy), HRR (Hydropower run-of-river inflow energy)	Variable	Variable (Energy)
5	0000m, 0002m, 0010m, 0100m	Level (meters above sea level)	Not applicable
6	Pecd (ENTSO-E PECD domain)	Region	Not applicable
7	025d (0.25°), NUT0 (NUTS 0), NUT2 (NUTS 2), PEOF (Pan-European Offshore Zones), PEON (Pan-European Onshore Zones), SZOF (Offshore Bidding Zones), SZON (Onshore Bidding Zones)	Spatial resolution	Gridded Regional aggregated timeseries
8	SYYYYMMDDhhmm (starting year, month, day, hour, minute)	Start date	Year Month
9	EYYYYMMDDhhmm (ending year, month, day, hour, minute)	End date	Year Month
10	ACC (accumulated), INS (Instantaneous), CFR (Capacity factor), NRG (Energy)	Type	Not applicable
11	MAP (gridded data), TIM (time series)		Not applicable
12	01h (1 hour), 01d (1 day), 07d (7 days)	Temporal resolution	Not applicable
13	NA-	Lead time	Not applicable
14	noc (no correction), cdf (Cumulative distribution fn), mbc (mean bias correction)	Bias adjustment method	Not applicable
15	NA-, org (original data), avg (mean)	Statistics	Not applicable
16	NA, 20 (Offshore wind turbine: Existing technologies), 21 (Offshore wind turbine: SP316 HH155), 22 (Offshore wind turbine: SP370 HH155), 30 (Onshore wind turbine: Existing technologies), 31 (Onshore wind turbine: SP199 HH100), 32 (Onshore wind turbine: SP199 HH150), 33 (Onshore wind turbine: SP199 HH200), 34 (Onshore wind turbine: SP277 HH100), 35 (Onshore wind turbine: SP277 HH150), 36 (Onshore wind turbine: SP277 HH200), 37 (Onshore wind turbine: SP335 HH100), 38 (Onshore wind turbine: SP335 HH150), 39 (Onshore wind turbine: SP335 HH200), 40 (Concentrated solar power: Pre-dispatch, no storage), 41 (Concentrated solar power: Dispatched, no storage), 42 (Concentrated solar power: Pre-dispatch, 7-hours of storage), 43 (Concentrated solar power: Dispatched, 7-hours of storage)	Technological specification	Technological specification (Offshore wind turbine, Onshore wind turbine, Concentrated solar power)
17	NA---, SP245 (ssp 245)	Emission scenario	Emissions
18	NA---	Energy scenario	Not applicable
19	NA---, StRnF (Statistical model/Random Forests), PhM01 (Physical Model/method1), PhM02 (Physical Model/method2), PhM03 (Physical Model/method3)	Transfer function	Not applicable
20	PECD4.1	Version of PECD database	Not applicable
21	fv1	File version	Not applicable
22	.nc (NetCDF) .csv (comma-separated values)	File formats	Not applicable

Example of filename: H_ERA5_ECMW_T639_TP-_0000m_Pecd_025d_S198501010000_E198501310000_ACC_MAP_01d_NA-_noc_org_NA_NA---_NA---_NA—_PECD4.1_fv1.nc

This NetCDF file (.nc) contains historical data (H) from ERA5 reanalysis (ERA5 and 7639) originated by ECMWF (ECMW); the variable is total precipitation (TP-) at 0m height (0000m), the coverage is PECD domain (Pecd) with a 0.25° spatial resolution (025d). Data span from 01/01/1985 at 00:00 UTC (S198501010000) to 31/01/1985 at 00:00 UTC (E198501310000). The data are accumulated (ACC), gridded (MAP), with a daily temporal resolution (01d). The lead time is not available (NA-), data are not bias-corrected (noc) and they are original (org). The ensemble number, emission scenario, energy scenario and transfer function are not available (NA_NA---_NA---_NA---). The PECD version is 4.1 (PECD4.1) while the file version is fv1.

Metadata

The header of the time series CSV files will contain the following metadata descriptors. An example of an air temperature variable is presented below, provided as a CSV file with the filename:

# General

## Title

### Air Temperature

## Abstract

### ERA5

## Date

### 2022-11-28

## Date type

### Publication: Date identifies when the data was issued

## Unit

### C

## URL

### https://cds.climate.copernicus.eu/

## Data format

### CSV

## Keywords

### ERA5: reanalysis : Copernicus : C3S : C3S Energy : ICS

## Point of contact

### Individual name

#### Alberto Troccoli

### Electronic mail address

#### info@inclimateservice.com

### Organisation name

#### Inside Climate Service

### Role

#### Owner: Party that owns the resource

# Usage

## Access constraints

### Intellectual property rights: The IP of these data belongs to the EU Copernicus programme

## Use constraints

### Creative Commons

## Citation(s)

### Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horanyi, A.; Munoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 global reanalysis. Q. J. Roy. Meteor. Soc. 2020, 146, 1999-2049, doi:10.1002/qj.3803

## Temporal extent

## Begin date

### 1980-01-01-0000

## End date

### 2021-12-31-2300

## Temporal resolution

### 1 hour

## Geographic bounding box

### westBoundLongitude -31.00

### eastBoundLongitude 45.00

### southBoundLatitude 18.00

### northBoundLatitude 75.00

## Spatial resolution

### NUTS0

# Lineage Statement

## Original Data Source

## Statement

### The original data sources are ECMWF ERA5 Reanalysis (available at: https://cds.climate.copernicus.eu)

#

How to cite the data*

*If the dataset is to be published in the CDS or ADS, then this may not be needed. Please check with the CUS team.

References

Beyer, H G, Heilscher, G, Bofinger, S, “A robust model for the MPP performance of different types of PV-modules applied for the performance check of grid connected systems”, EuroSun 2004 conference, pp. 3064-3071, Germany, June 2004.

Blanc, P., Wald, L., “The SG2 algorithm for a fast and accurate computation of the position of the Sun for multi-decadal time period”, Solar Energy, vol. 86, pp. 3072-3083, 2012.
(https://doi.org/10.1016/j.solener.2012.07.018)

Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E.: Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geosci. Model Dev., 9, 1937–1958, https://doi.org/10.5194/gmd-9-1937-2016, 2016.

Gueymard, C. A., Lara-Fanego, V., Sengupta, M., & Xie, Y., “Surface albedo and reflectance: Review of definitions, angular and spectral effects, and intercomparison of major data sources in support of advanced solar irradiance modeling over the Americas”, Solar Energy, vol. 182, pp. 194–212, 2019. (https://doi.org/10.1016/j.solener.2019.02.040)

Ho, L.T.T.; Dubus, L.; De Felice, M.; Troccoli, 2020. A. Reconstruction of Multidecadal Country-Aggregated Hydro Power Generation in Europe Based on a Random Forest Model. Energies 2020, 13, 1786. https://doi.org/10.3390/en13071786

Jourdier, B.: Evaluation of ERA5, MERRA-2, COSMO-REA6, NEWA and AROME to simulate wind power production over France, Adv. Sci. Res., 17, 63–77, https://doi.org/10.5194/asr-17-63-2020, 2020.

Klucher, T. M., “Evaluation of models to predict insolation on tilted surfaces”, Solar Energy, vol. 23, pp. 111–114, 1979. (https://doi.org/10.1016/0038-092X(79)90110-5)

Koivisto M., K. Plakas, E. R. Hurtado Ellmann, N. Davis, P. Sørensen, “Application of microscale wind and detailed wind power plant data in large-scale wind generation simulations”, Electric Power Systems Research, vol. 190, 106638, January 2021(https://doi.org/10.1016/j.epsr.2020.106638)

Liu, B., & Jordan, R., “Daily insolation on surfaces tilted towards equator”, ASHRAE Transactions, 10, pp. 526–541, 1961.

Martin, N., & Ruiz, J. M., “Calculation of the PV modules angular losses under field conditions by means of an analytical model”, Solar Energy Materials and Solar Cells, vol. 70, pp. 25–38, 2001. (https://doi.org/10.1016/S0927-0248(00)00408-6)

Martin, N., & Ruiz, J. M., Corrigendum to “Calculation of the PV modules angular losses under field conditions by means of an analytical model”, Solar Energy Materials and Solar Cells, vol. 110, pp. 154, 2013.

Michelangeli, P.-A., Vrac, M., and Loukos, H. (2009). Probabilistic downscaling approaches: Application to wind cumulative distribution functions, Geophys. Res. Lett., 36, L11708, doi:10.1029/2009GL038401.

Mortensen N. G., “Wind resource assessment using the WAsP software”, WindEurope DTU report, 2018 (https://backend.orbit.dtu.dk/ws/portalfiles/portal/164389714/Wind_resource_assessment_using_the_WAsP_software_DTU_Wind_Energy_E_0174_.pdf).

Murcia Leon J. P., M. J. Koivisto, P. Sørensen, P. Magnant, “Power Fluctuations In High Installation Density Offshore Wind Fleets”, Wind Energy Science, vol. 6, pp. 461–476, 2021. (https://doi.org/10.5194/wes-6-461-2021).

Murcia J. P., M. J. Koivisto, G. Luzia, B. T. Olsen, A. N. Hahmann, P. E. Sørensen, M. Als, “Validation of European-scale simulated wind speed and wind generation time series”, Applied Energy, vol. 305, 117794, January 2022 (https://doi.org/10.1016/j.apenergy.2021.117794).

Navarro-Racines, C., Tarapues, J., Thornton, P. et al. (2020). High-resolution and bias-corrected CMIP5 projections for climate change impact assessments. Sci Data 7, 7. https://doi.org/10.1038/s41597-019-0343-8

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J., 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, pp.2825-2830.

pvlib, python v0.9.3 documentation, Klucher irradiance transposition function. Retrieved November 30, 2022, from https://pvlib-python.readthedocs.io/en/stable/reference/generated/pvlib.irradiance.klucher.html, n.d.

pvlib, python v0.9.3 documentation, ground diffuse irradiance function. Retrieved November 30, 2022, from https://pvlib-python.readthedocs.io/en/stable/reference/generated/pvlib.irradiance.klucher.html, n.d.

PyPi sg2 package entry. Retrieved November 30, 2022, from https://pypi.org/project/sg2/, n.d.

Ross, R.G., “Interface design considerations for terrestrial solar cell modules,” in Photovoltaic Specialists Conference Record, pp. 801-806, United States of America, 1976.

Saint-Drenan, Y.-M., Wald, L., Ranchin, T., Dubus, L., and Troccoli, A.: An approach for the estimation of the aggregated photovoltaic power generated in several European countries from meteorological data, Adv. Sci. Res., 15, 51–62, https://doi.org/10.5194/asr-15-51-2018, 2018.

Schmidt, H., Sauer, D. U., ‘‘Practical modeling and estimation of inverter efficiencies,’’ 9th Internationales Sonnenforum, pp. 550–557, Germany, 1994.

Skartveit, A., & Olseth, J. A., “A model for the diffuse fraction of hourly global radiation”, Solar Energy, vol. 38, pp. 271-274, 1987. (https://doi.org/10.1016/0038-092X(87)90049-1)

Swisher P., J. P. Murcia Leon, J. Gea-Bermúdez, M. Koivisto, H. Madsen, M. Münster, “Competitiveness of a low specific power, low cut-out wind speed wind turbine in North and Central Europe towards 2050”, Applied Energy, vol. 306, part B, 118043, January 2022 (https://doi.org/10.1016/j.apenergy.2021.118043).

Temps, R. C., & Coulson, K. L., “Solar radiation incident upon slopes of different orientations”, Solar Energy, vol. 19, pp. 179–184 (https://doi.org/10.1016/0038-092X(77)90056-1).

Williams, S. R., Betts, T. R., Helf, T., Gottschalg, R., Beyer, H. G., Infield, D. G., “Modelling long-term module performance based on realistic reporting conditions with consideration to spectral effects”, 3rd World Conference on Photovoltaic Energy Conversion, vol. 2, pp. 1908-1911, 2003.

_{This document has been produced in the context of the Copernicus Climate Change Service (C3S).}

_{The activities leading to these results have been contracted by the European Centre for Medium-Range Weather Forecasts, operator of C3S on behalf of the European Union (Delegation Agreement signed on 11/11/2014 and Contribution Agreement signed on 22/07/2021). All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose.}

_{The users thereof use the information at their sole risk and liability. For the avoidance of all doubt, the European Commission and the European Centre for Medium - Range Weather Forecasts have no liability in respect of this document, which is merely representing the author's view.}

Space shortcuts

Page tree

Climate and energy related variables from the Pan-European Climate Database derived from reanalysis and climate projections: Product user guide (PUG)

History of modifications

List of datasets covered by this document

Acronyms and abbreviations

Introduction

Workflows

Historical stream

Data retrieval

Power law for wind profile scaling

Alpha computation

Alpha characterization

Wind speed bias adjustment for PECDv4.1

Preprocessing

Quality check of the dataset

Bias-adjustment procedure

Population-weighted temperature

Population mask

Computation of Population-weighted temperature

Spatial aggregation

Required spatial aggregation level for PECDv4.1

Mask

Spatial aggregation procedure

Climate indicators

Energy data

Exclusion areas

Energy Conversion models

Wind Power conversion model

Handling of the climate data

Conversion to wind power generation

Simulated locations and wind technologies

Aggregation to the regional level

Post-hoc corrections following TSOs’ feedback

Photovoltaic Solar Power conversion model

Temporal downscaling

Inferring plane-of-array irradiance: decomposition and transposition

PV modelling: optical losses, conversion efficiency, temperature and inverter losses

Integrating a distribution of plane-of-arrays

Application of exclusion areas and spatial aggregation

Improvements over the previous methodology

Assessing model performance

Testing a new typology-based segmented modelling approach

Post-hoc corrections following TSO's feedback

Concentrated Solar Power conversion model

Hydro Power conversion model

The Statistical Model

Energy data pre-processing

Climate data preprocessing

Model validation: Leave-One-Year-Out Validation

RF Model Parameters

Model Validation Results

Modelling Historical stream

Estimating Inflows

Inflow to Open-loop Pumping

Inflow to Reservoirs

Inflow to Run-of-rivers and Pondage

Use of PECDv3.1 inflow estimates

Post-hoc corrections following TSOs’ feedback

Summary Table

Energy indicators

Known issues

Projection stream

Projection models

Choice of models

Data retrieval

Spatial interpolation

Temporal aggregation and interpolation

Bias-adjustment procedure

Climate indicators

Energy data

Energy Conversion models

Wind Power conversion model

Photovoltaic Solar Power conversion model

Concentrated Solar Power conversion model

Hydro Power conversion model

Energy indicators

Appendix

Filenames convention

Metadata