Degradation and soiling example
This jupyter notebook is intended to the RdTools trend analysis workflow using the functional API. In addition, the notebook demonstrates the effects of changes in the workflow. For a consistent experience, we recommend installing the specific versions of packages used to develop this notebook. This can be achieved in your environment by running pip install -r requirements.txt
followed by pip install -r docs/notebook_requirements.txt
from the base directory. (RdTools must also be
separately installed.) These environments and examples are tested with Python 3.12.
The calculations consist of several steps illustrated here:
Import and preliminary calculations
Normalize data using a performance metric
Filter data to reduce error
Aggregate data
Post-aggregation filter
Analyze aggregated data to estimate the degradation rate
Analyze aggregated data to estimate the soiling loss
Earlier versions of this notebook (RdTools<3.0) included a clear sky workflow in order to check the results for bias from sensor drift. With the wide availability of satellite data, we now recommend repeating the analysis with satellite data to double check the sensor-based result. This is illustrated using the object-oriented API in TrendAnalysis_example_NSRDB.ipynb
This notebook works with data from the NREL PVDAQ [4] NREL x-Si #1
system. Note that because this system does not experience significant soiling, the dataset contains a synthesized soiling signal for use in the soiling section of the example. This notebook automatically downloads and locally caches the dataset used in this example. The data can also be found on the DuraMAT Datahub (https://datahub.duramat.org/dataset/pvdaq-time-series-with-soiling-signal).
[1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pvlib
import rdtools
%matplotlib inline
[2]:
#Update the style of plots
import matplotlib
matplotlib.rcParams.update({'font.size': 12,
'figure.figsize': [4.5, 3],
'lines.markeredgewidth': 0,
'lines.markersize': 2
})
# Register time series plotting in pandas > 1.0
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
[3]:
# Set the random seed for numpy to ensure consistent results
np.random.seed(0)
0: Import and preliminary calculations
This section prepares the data necessary for an rdtools
calculation. The first step of the rdtools
workflow is normalization, which requires a time series of energy yield, a time series of cell temperature, and a time series of irradiance, along with some metadata (see Step 1: Normalize)
The following section loads the data, adjusts units where needed, and renames the critical columns. The ambient temperature sensor data source is converted into estimated cell temperature. This dataset already has plane-of-array irradiance data, so no transposition is necessary.
A common challenge is handling datasets with and without daylight savings time. Make sure to specify a pytz
timezone that does or does not include daylight savings time as appropriate for your dataset.
The steps of this section may change depending on your data source or the system being considered. Transposition of irradiance and modeling of cell temperature are generally outside the scope of rdtools
. A variety of tools for these calculations are available in pvlib.
[4]:
# Import the example data
file_url = ('https://datahub.duramat.org/dataset/'
'a49bb656-7b36-437a-8089-1870a40c2a7d/'
'resource/d2c3fcf4-4f5f-47ad-8743-fc29'
'f1356835/download/pvdaq_system_4_2010'
'-2016_subset_soil_signal.csv')
cache_file = 'PVDAQ_system_4_2010-2016_subset_soilsignal.pickle'
try:
df = pd.read_pickle(cache_file)
except FileNotFoundError:
df = pd.read_csv(file_url, index_col=0, parse_dates=True)
df.to_pickle(cache_file)
df = df.rename(columns = {
'ambient_temp': 'Tamb',
'poa_irradiance': 'poa',
})
# Specify the Metadata
meta = {"latitude": 39.7406,
"longitude": -105.1774,
"timezone": 'Etc/GMT+7',
"gamma_pdc": -0.005,
"azimuth": 180,
"tilt": 40,
"power_dc_rated": 1000.0,
"temp_model_params":
pvlib.temperature.TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_polymer']}
df.index = df.index.tz_localize(meta['timezone'])
# There is some missing data, but we can infer the frequency from
# the first several data points
freq = pd.infer_freq(df.index[:10])
# Then set the frequency of the dataframe.
# It is recommended not to up- or downsample at this step
# but rather to use interpolate to regularize the time series
# to its dominant or underlying frequency. Interpolate is not
# generally recommended for downsampling in this application.
df = rdtools.interpolate(df, freq)
# Calculate cell temperature
df['Tcell'] = pvlib.temperature.sapm_cell(df.poa, df.Tamb,
df.wind_speed, **meta['temp_model_params'])
# plot the AC power time series
fig, ax = plt.subplots(figsize=(4,3))
ax.plot(df.index, df.ac_power, 'o', alpha=0.01)
ax.set_ylim(0,1500)
fig.autofmt_xdate()
ax.set_ylabel('AC Power (W)')
plt.show()
This example dataset includes a synthetic soiling signal that can be applied onto the PV power data to illustrate the soiling loss and detection capabilities of RdTools. AC Power is multiplied by soiling to create the synthetic ‘power’ channel
[5]:
fig, ax = plt.subplots(figsize=(4,3))
ax.plot(df.index, df.soiling, 'o', alpha=0.01)
#ax.set_ylim(0,1500)
fig.autofmt_xdate()
ax.set_ylabel('soiling signal')
plt.show()
df['power'] = df['ac_power'] * df['soiling']
1: Normalize
Data normalization is achieved with rdtools.normalize_with_expected_power()
. This function can be used to normalize to any modeled or expected power. Note that realized PV output can be given as energy, rather than power, by using an optional key word argument.
[6]:
# Calculate the expected power with a simple PVWatts DC model
modeled_power = pvlib.pvsystem.pvwatts_dc(df['poa'], df['Tcell'], meta['power_dc_rated'],
meta['gamma_pdc'], 25.0 )
# Calculate the normalization, the function also returns the relevant insolation for
# each point in the normalized PV energy timeseries
normalized, insolation = rdtools.normalize_with_expected_power(df['power'],
modeled_power,
df['poa'])
df['normalized'] = normalized
df['insolation'] = insolation
# Plot the normalized power time series
fig, ax = plt.subplots()
ax.plot(normalized.index, normalized, 'o', alpha = 0.05)
ax.set_ylim(0,2)
fig.autofmt_xdate()
ax.set_ylabel('Normalized energy')
plt.show()
2: Filter
Data filtering is used to exclude data points that represent invalid data, create bias in the analysis, or introduce significant noise.
It can also be useful to remove outages and outliers. Sometimes outages appear as low but non-zero yield. Automatic functions for outage detection are not yet included in rdtools
. However, this example does filter out data points where the normalized energy is less than 1%. System-specific filters should be implemented by the analyst if needed.
[7]:
# Calculate a collection of boolean masks that can be used
# to filter the time series
normalized_mask = rdtools.normalized_filter(df['normalized'])
poa_mask = rdtools.poa_filter(df['poa'])
tcell_mask = rdtools.tcell_filter(df['Tcell'])
# Note: This clipping mask may be disabled when you are sure the system is not
# experiencing clipping due to high DC/AC ratio
clip_mask = rdtools.clip_filter(df['power'])
# filter the time series and keep only the columns needed for the
# remaining steps
filtered = df[normalized_mask & poa_mask & tcell_mask & clip_mask]
filtered = filtered[['insolation', 'normalized']]
fig, ax = plt.subplots()
ax.plot(filtered.index, filtered.normalized, 'o', alpha = 0.05)
ax.set_ylim(0,2)
fig.autofmt_xdate()
ax.set_ylabel('Normalized energy')
plt.show()
Filter visualization example: different clipping filters
RdTools provides functions to visualize and tune filters for different applications. In this example, we take a subset of the data, apply an artificial clipping signal, and visualize the results of three different clipping filter methods.
[8]:
# Apply an artificial clipping signal to a subset of the data
example_subset = df.iloc[0:15000].copy()
example_subset.loc[example_subset['ac_power']>800,'ac_power']=800
# Generate clipping masks with each of the available methods
clip_mask_quantile = rdtools.clip_filter(example_subset['ac_power'], 'quantile')
clip_mask_xgboost = rdtools.clip_filter(example_subset['ac_power'], 'xgboost')
clip_mask_logic = rdtools.clip_filter(example_subset['ac_power'], 'logic')
[9]:
# NBVAL_IGNORE_OUTPUT
rdtools.tune_filter_plot(example_subset['ac_power'], clip_mask_quantile)