Degradation and soiling example with clearsky workflow¶

This jupyter notebook is intended to the RdTools analysis workflow. In addition, the notebook demonstrates the effects of changes in the workflow. For a consistent experience, we recommend installing the specific versions of packages used to develop this notebook. This can be achieved in your environment by running pip install -r requirements.txt followed by pip install -r docs/notebook_requirements.txt from the base directory. (RdTools must also be separately installed.) These environments and examples are tested with Python 3.7.

The calculations consist of several steps illustrated here:

Import and preliminary calculations
Normalize data using a performance metric
Filter data that creates bias
Aggregate data
Analyze aggregated data to estimate the degradation rate
Analyze aggregated data to estimate the soiling loss

After demonstrating these steps using sensor data, a modified version of the workflow is illustrated using modeled clear sky irradiance and temperature. The results from the two methods are compared at the end.

This notebook works with data from the NREL PVDAQ [4] NREL x-Si #1 system. Note that because this system does not experience significant soiling, the dataset contains a synthesized soiling signal for use in the soiling section of the example. This notebook automatically downloads and locally caches the dataset used in this example. The data can also be found on the DuraMAT Datahub (https://datahub.duramat.org/dataset/pvdaq-time-series-with-soiling-signal).

[1]:

from datetime import timedelta
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pvlib
import rdtools
%matplotlib inline

[2]:

#Update the style of plots
import matplotlib
matplotlib.rcParams.update({'font.size': 12,
                           'figure.figsize': [4.5, 3],
                           'lines.markeredgewidth': 0,
                           'lines.markersize': 2
                           })
# Register time series plotting in pandas > 1.0
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

[3]:

# Set the random seed for numpy to ensure consistent results
np.random.seed(0)

0: Import and preliminary calculations¶

This section prepares the data necessary for an rdtools calculation. The first step of the rdtools workflow is normalization, which requires a time series of energy yield, a time series of cell temperature, and a time series of irradiance, along with some metadata (see Step 1: Normalize)

The following section loads the data, adjusts units where needed, and renames the critical columns. The ambient temperature sensor data source is converted into estimated cell temperature. This dataset already has plane-of-array irradiance data, so no transposition is necessary.

A common challenge is handling datasets with and without daylight savings time. Make sure to specify a pytz timezone that does or does not include daylight savings time as appropriate for your dataset.

The steps of this section may change depending on your data source or the system being considered. Transposition of irradiance and modeling of cell temperature are generally outside the scope of rdtools. A variety of tools for these calculations are available in pvlib.

[4]:

# Import the example data
file_url = ('https://datahub.duramat.org/dataset/a49bb656-7b36-'
            '437a-8089-1870a40c2a7d/resource/5059bc22-640d-4dd4'
            '-b7b1-1e71da15be24/download/pvdaq_system_4_2010-2016'
            '_subset_soilsignal.csv')
cache_file = 'PVDAQ_system_4_2010-2016_subset_soilsignal.pickle'

try:
    df = pd.read_pickle(cache_file)
except FileNotFoundError:
    df = pd.read_csv(file_url, index_col=0, parse_dates=True)
    df.to_pickle(cache_file)

df = df.rename(columns = {
    'ac_power':'power_ac',
    'wind_speed': 'wind_speed',
    'ambient_temp': 'Tamb',
    'poa_irradiance': 'poa',
})

# Specify the Metadata
meta = {"latitude": 39.7406,
        "longitude": -105.1774,
        "timezone": 'Etc/GMT+7',
        "gamma_pdc": -0.005,
        "azimuth": 180,
        "tilt": 40,
        "power_dc_rated": 1000.0,
        "temp_model_params":
        pvlib.temperature.TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_polymer']}

df.index = df.index.tz_localize(meta['timezone'])

loc = pvlib.location.Location(meta['latitude'], meta['longitude'], tz = meta['timezone'])
sun = loc.get_solarposition(df.index)

# There is some missing data, but we can infer the frequency from
# the first several data points
freq = pd.infer_freq(df.index[:10])

# Then set the frequency of the dataframe.
# It is recommended not to up- or downsample at this step
# but rather to use interpolate to regularize the time series
# to its dominant or underlying frequency. Interpolate is not
# generally recommended for downsampling in this application.
df = rdtools.interpolate(df, freq)

# Calculate cell temperature
df['Tcell'] = pvlib.temperature.sapm_cell(df.poa, df.Tamb,
                                          df.wind_speed, **meta['temp_model_params'])

# plot the AC power time series
fig, ax = plt.subplots(figsize=(4,3))
ax.plot(df.index, df.power_ac, 'o', alpha=0.01)
ax.set_ylim(0,1500)
fig.autofmt_xdate()
ax.set_ylabel('AC Power (W)');

1: Normalize¶

Data normalization is achieved with rdtools.normalize_with_expected_power(). This function can be used to normalize to any modeled or expected power. Note that realized PV output can be given as energy, rather than power, by using an optional key word argument.

[5]:

# Calculate the expected power with a simple PVWatts DC model
modeled_power = pvlib.pvsystem.pvwatts_dc(df['poa'], df['Tcell'], meta['power_dc_rated'],
                                          meta['gamma_pdc'], 25.0 )

# Calculate the normalization, the function also returns the relevant insolation for
# each point in the normalized PV energy timeseries
normalized, insolation = rdtools.normalize_with_expected_power(df['power_ac'],
                                                               modeled_power,
                                                               df['poa'])

df['normalized'] = normalized
df['insolation'] = insolation

# Plot the normalized power time series
fig, ax = plt.subplots()
ax.plot(normalized.index, normalized, 'o', alpha = 0.05)
ax.set_ylim(0,2)
fig.autofmt_xdate()
ax.set_ylabel('Normalized energy');

2: Filter¶

Data filtering is used to exclude data points that represent invalid data, create bias in the analysis, or introduce significant noise.

It can also be useful to remove outages and outliers. Sometimes outages appear as low but non-zero yield. Automatic functions for outage detection are not yet included in rdtools. However, this example does filter out data points where the normalized energy is less than 1%. System-specific filters should be implemented by the analyst if needed.

[6]:

# Calculate a collection of boolean masks that can be used
# to filter the time series
normalized_mask = rdtools.normalized_filter(df['normalized'])
poa_mask = rdtools.poa_filter(df['poa'])
tcell_mask = rdtools.tcell_filter(df['Tcell'])
# Note: This clipping mask may be disabled when you are sure the system is not
# experiencing clipping due to high DC/AC ratio
clip_mask = rdtools.clip_filter(df['power_ac'])

# filter the time series and keep only the columns needed for the
# remaining steps
filtered = df[normalized_mask & poa_mask & tcell_mask & clip_mask]
filtered = filtered[['insolation', 'normalized']]

fig, ax = plt.subplots()
ax.plot(filtered.index, filtered.normalized, 'o', alpha = 0.05)
ax.set_ylim(0,2)
fig.autofmt_xdate()
ax.set_ylabel('Normalized energy');

3: Aggregate¶

Data is aggregated with an irradiance weighted average. This can be useful, for example with daily aggregation, to reduce the impact of high-error data points in the morning and evening.

[7]:

daily = rdtools.aggregation_insol(filtered.normalized, filtered.insolation,
                                  frequency = 'D')

fig, ax = plt.subplots()
ax.plot(daily.index, daily, 'o', alpha = 0.1)
ax.set_ylim(0,2)
fig.autofmt_xdate()
ax.set_ylabel('Normalized energy');

4: Degradation calculation¶

Data is then analyzed to estimate the degradation rate representing the PV system behavior. The results are visualized and statistics are reported, including the 68.2% confidence interval, and the P95 exceedance value.

[8]:

# Calculate the degradation rate using the YoY method
yoy_rd, yoy_ci, yoy_info = rdtools.degradation_year_on_year(daily, confidence_level=68.2)
# Note the default confidence_level of 68.2 is appropriate if you would like to
# report a confidence interval analogous to the standard deviation of a normal
# distribution. The size of the confidence interval is adjustable by setting the
# confidence_level variable.

# Visualize the results

degradation_fig = rdtools.degradation_summary_plots(
    yoy_rd, yoy_ci, yoy_info, daily,
    summary_title='Sensor-based degradation results',
    scatter_ymin=0.5, scatter_ymax=1.1,
    hist_xmin=-30, hist_xmax=45, bins=100
)

In addition to the confidence interval, the year-on-year method yields an exceedance value (e.g. P95), the degradation rate that was exceeded (slower degradation) with a given probability level. The probability level is set via the exceedance_prob keyword in degradation_year_on_year.

[9]:

print('The P95 exceedance level is %.2f%%/yr' % yoy_info['exceedance_level'])

The P95 exceedance level is -0.63%/yr

5: Soiling calculations¶

This section illustrates how the aggregated data can be used to estimate soiling losses using the stochastic rate and recovery (SRR) method.¹ Since our example system doesn’t experience much soiling, we apply an artificially generated soiling signal, just for the sake of example.

¹ M. G. Deceglie, L. Micheli and M. Muller, “Quantifying Soiling Loss Directly From PV Yield,” IEEE Journal of Photovoltaics, vol. 8, no. 2, pp. 547-551, March 2018. doi: 10.1109/JPHOTOV.2017.2784682

[10]:

# Apply artificial soiling signal for example
# be sure to remove this for applications on real data,
# and proceed with analysis on `daily` instead of `soiled_daily`

soiling = df['soiling'].resample('D').mean()
soiled_daily = soiling*daily

[11]:

# Calculate the daily insolation, required for the SRR calculation
daily_insolation = filtered['insolation'].resample('D').sum()

# Perform the SRR calculation
from rdtools.soiling import soiling_srr
cl = 68.2
sr, sr_ci, soiling_info = soiling_srr(soiled_daily, daily_insolation,
                                      confidence_level=cl)

/Users/mdecegli/Documents/GitHub/rdtools/rdtools/soiling.py:15: UserWarning: The soiling module is currently experimental. The API, results, and default behaviors may change in future releases (including MINOR and PATCH releases) as the code matures.
  'The soiling module is currently experimental. The API, results, '

[12]:

print('The P50 insolation-weighted soiling ratio is %0.3f'%sr)

The P50 insolation-weighted soiling ratio is 0.945

[13]:

print('The %0.1f confidence interval for the insolation-weighted'
      ' soiling ratio is %0.3f–%0.3f'%(cl, sr_ci[0], sr_ci[1]))

The 68.2 confidence interval for the insolation-weighted soiling ratio is 0.939–0.951

[14]:

# Plot Monte Carlo realizations of soiling profiles
fig = rdtools.plotting.soiling_monte_carlo_plot(soiling_info, soiled_daily, profiles=200);

/Users/mdecegli/Documents/GitHub/rdtools/rdtools/plotting.py:151: UserWarning: The soiling module is currently experimental. The API, results, and default behaviors may change in future releases (including MINOR and PATCH releases) as the code matures.
  'The soiling module is currently experimental. The API, results, '

[15]:

# Plot the slopes for "valid" soiling intervals identified,
# assuming perfect cleaning events
fig = rdtools.plotting.soiling_interval_plot(soiling_info, soiled_daily);

/Users/mdecegli/Documents/GitHub/rdtools/rdtools/plotting.py:211: UserWarning: The soiling module is currently experimental. The API, results, and default behaviors may change in future releases (including MINOR and PATCH releases) as the code matures.
  'The soiling module is currently experimental. The API, results, '

[16]:

# View the first several rows of the soiling interval summary table
soiling_summary = soiling_info['soiling_interval_summary']
soiling_summary.head()

[16]:

	start	end	soiling_rate	soiling_rate_low	soiling_rate_high	inferred_start_loss	inferred_end_loss	length	valid
0	2010-02-25 00:00:00-07:00	2010-03-06 00:00:00-07:00	0.000000	0.000000	0.000000	0.685379	0.863517	9	False
1	2010-03-07 00:00:00-07:00	2010-03-11 00:00:00-07:00	0.000000	0.000000	0.000000	1.053439	1.003025	4	False
2	2010-03-12 00:00:00-07:00	2010-04-08 00:00:00-07:00	-0.002505	-0.005069	0.000000	1.058785	0.991144	27	True
3	2010-04-09 00:00:00-07:00	2010-04-11 00:00:00-07:00	0.000000	0.000000	0.000000	1.044975	1.044975	2	False
4	2010-04-12 00:00:00-07:00	2010-06-15 00:00:00-07:00	-0.000594	-0.000997	-0.000174	1.011211	0.973207	64	True

[17]:

# View a histogram of the valid soiling rates found for the data set
fig = rdtools.plotting.soiling_rate_histogram(soiling_info, bins=50)

/Users/mdecegli/Documents/GitHub/rdtools/rdtools/plotting.py:251: UserWarning: The soiling module is currently experimental. The API, results, and default behaviors may change in future releases (including MINOR and PATCH releases) as the code matures.
  'The soiling module is currently experimental. The API, results, '

These plots show generally good results from the SRR method. In this example, we have slightly overestimated the soiling loss because we used the default behavior of the method key word argument in rdtools.soiling_srr(), which does not assume that every cleaning is perfect but the example artificial soiling signal did include perfect cleaning. We encourage you to adjust the options of rdtools.soiling_srr() for your application.

[18]:

# Calculate and view a monthly soiling rate summary
from rdtools.soiling import monthly_soiling_rates
monthly_soiling_rates(soiling_info['soiling_interval_summary'],
                      confidence_level=cl)

[18]:

	month	soiling_rate_median	soiling_rate_low	soiling_rate_high	interval_count
0	1	-0.000942	-0.001954	-0.000692	6
1	2	-0.001794	-0.006180	-0.000752	7
2	3	-0.001096	-0.002230	-0.000394	10
3	4	-0.000924	-0.001899	-0.000122	9
4	5	-0.000305	-0.000733	-0.000086	7
5	6	-0.000331	-0.000777	-0.000091	8
6	7	-0.000404	-0.001342	-0.000140	8
7	8	-0.000674	-0.001779	-0.000182	7
8	9	-0.000856	-0.001572	-0.000191	8
9	10	-0.000881	-0.001413	-0.000203	8
10	11	-0.000920	-0.001894	-0.000229	8
11	12	-0.000947	-0.002455	-0.000691	6

[19]:

# Calculate and view annual insolation-weighted soiling ratios and their confidence
# intervals based on the Monte Carlo simulation. Note that these losses include the
# assumptions of the cleaning assumptions associated with the method parameter
# of rdtools.soiling_srr(). For anything but 'perfect_clean', each year's soiling
# ratio may be impacted by prior years' soiling profiles. The default behavior of
# rdtools.soiling_srr uses method='half_norm_clean'

from rdtools.soiling import annual_soiling_ratios
annual_soiling_ratios(soiling_info['stochastic_soiling_profiles'],
                      daily_insolation,
                      confidence_level=cl)

[19]:

	year	soiling_ratio_median	soiling_ratio_low	soiling_ratio_high
0	2010	0.961769	0.950512	0.969079
1	2011	0.944563	0.937086	0.950570
2	2012	0.939465	0.931211	0.945439
3	2013	0.954355	0.944595	0.961878
4	2014	0.949834	0.929179	0.965085
5	2015	0.950557	0.921117	0.966028
6	2016	0.937150	0.925213	0.944815

Clear sky workflow¶

The clear sky workflow is useful in that it avoids problems due to drift or recalibration of ground-based sensors. We use pvlib to model the clear sky irradiance. This is renormalized to align it with ground-based measurements. Finally we use rdtools.get_clearsky_tamb() to model the ambient temperature on clear sky days. This modeled ambient temperature is used to model cell temperature with pvlib. If high quality ambient temperature data is available, that can be used instead of the modeled ambient; we proceed with the modeled ambient temperature here for illustrative purposes.

In this example, note that we have omitted wind data in the cell temperature calculations for illustrative purposes. Wind data can also be included when the data source is trusted for improved results

We generally recommend that the clear sky workflow be used as a check on the sensor workflow. It tends to be more sensitive than the sensor workflow, and thus we don’t recommend it as a stand-alone analysis.

Note that the calculations below rely on some objects from the steps above

Clear Sky 0: Preliminary Calculations¶

[20]:

# Calculate the clear sky POA irradiance
clearsky = loc.get_clearsky(df.index, solar_position=sun)

cs_sky = pvlib.irradiance.isotropic(meta['tilt'], clearsky.dhi)
cs_beam = pvlib.irradiance.beam_component(meta['tilt'], meta['azimuth'],
                                          sun.zenith, sun.azimuth, clearsky.dni)
df['clearsky_poa'] = cs_beam + cs_sky

# Renormalize the clear sky POA irradiance
df['clearsky_poa'] = rdtools.irradiance_rescale(df.poa, df.clearsky_poa,
                                                method='iterative')

# Calculate the clearsky temperature
df['clearsky_Tamb'] = rdtools.get_clearsky_tamb(df.index, meta['latitude'],
                                                meta['longitude'])
df['clearsky_Tcell'] = pvlib.temperature.sapm_cell(df.clearsky_poa, df.clearsky_Tamb,
                                                   0, **meta['temp_model_params'])

Clear Sky 1: Normalize¶

Normalize as in step 1 above, but this time using clearsky modeled irradiance and cell temperature

[21]:

# Calculate the expected power with a simple PVWatts DC model
clearsky_modeled_power = pvlib.pvsystem.pvwatts_dc(df['clearsky_poa'],
                                                   df['clearsky_Tcell'],
                                                   meta['power_dc_rated'], meta['gamma_pdc'], 25.0 )

# Calculate the normalization, the function also returns the relevant insolation for
# each point in the normalized PV energy timeseries
clearsky_normalized, clearsky_insolation = rdtools.normalize_with_expected_power(
    df['power_ac'],
    clearsky_modeled_power,
    df['clearsky_poa']
)

df['clearsky_normalized'] = clearsky_normalized
df['clearsky_insolation'] = clearsky_insolation

Clear Sky 2: Filter¶

Filter as in step 2 above, but with the addition of a clear sky index (csi) filter so we consider only points well modeled by the clear sky irradiance model.

[22]:

# Perform clearsky filter
cs_normalized_mask = rdtools.normalized_filter(df['clearsky_normalized'])
cs_poa_mask = rdtools.poa_filter(df['clearsky_poa'])
cs_tcell_mask = rdtools.tcell_filter(df['clearsky_Tcell'])

csi_mask = rdtools.csi_filter(df.insolation, df.clearsky_insolation)

clearsky_filtered = df[cs_normalized_mask & cs_poa_mask & cs_tcell_mask &
                       clip_mask & csi_mask]
clearsky_filtered = clearsky_filtered[['clearsky_insolation', 'clearsky_normalized']]

Clear Sky 3: Aggregate¶

Aggregate the clear sky version of of the filtered data

[23]:

clearsky_daily = rdtools.aggregation_insol(clearsky_filtered.clearsky_normalized,
                                           clearsky_filtered.clearsky_insolation)

Clear Sky 4: Degradation Calculation¶

Estimate the degradation rate and compare to the results obtained with sensors. In this case, we see that the degradation rate estimated with the clearsky methodology is not far off from the sensor-based estimate.

[24]:

# Calculate the degradation rate using the YoY method
cs_yoy_rd, cs_yoy_ci, cs_yoy_info = rdtools.degradation_year_on_year(
    clearsky_daily,
    confidence_level=68.2
)

# Note the default confidence_level of 68.2 is appropriate if you would like to
# report a confidence interval analogous to the standard deviation of a normal
# distribution. The size of the confidence interval is adjustable by setting the
# confidence_level variable.

# Visualize the results
clearsky_fig = rdtools.degradation_summary_plots(
    cs_yoy_rd, cs_yoy_ci, cs_yoy_info, clearsky_daily,
    summary_title='Clear-sky-based degradation results',
    scatter_ymin=0.5, scatter_ymax=1.1,
    hist_xmin=-30, hist_xmax=45, plot_color='orangered',
    bins=100);

print('The P95 exceedance level with the clear sky analysis is %.2f%%/yr' %
      cs_yoy_info['exceedance_level'])

The P95 exceedance level with the clear sky analysis is -0.81%/yr

[25]:

# Compare to previous sensor results
degradation_fig

[25]: