01 - SPI basics with rsd.standardize¶
An end-to-end walkthrough of the 1-D API that mirrors the package's core motivation: standardise a target time series against a fixed reference distribution and compare what the three available methods produce on the same inputs.
We use 20 years of synthetic monthly precipitation for both a stationary reference and a drying-trend target with a few extreme drought events. The target deliberately reaches values below anything seen in the reference - the situation that motivated RSD in the first place. We then compute SPI-3 with each of the three methods and plot the reference (self-standardised) and target (standardised against the reference) side by side, one method per panel.
No external data; runs in a few seconds.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import rsd
print(f"rsd version: {rsd.__version__}")
print(f"available methods: {rsd.METHODS}")
rsd version: 0.1.dev0+d20260520
available methods: ('rsd', 'monthwise_ecdf', 'monthwise_parametric')
1. Build a stationary reference and a drying-trend target¶
Both series share the same 20-year monthly time axis. The reference is a stationary seasonal gamma with constant coefficient of variation - no trend, no shocks. The target uses the same seasonal cycle but its mean shrinks linearly over the 20 years (a drying climate), and a handful of months get extreme drought multipliers so the lowest target values fall below the lowest value the reference ever sees.
That asymmetry is the point: it forces monthwise_ecdf (the classical
empirical baseline, sometimes called SDAT after Farahmand and
AghaKouchak, 2015) to hit its rank-extreme floor and stop responding,
while the RSD method extrapolates through its GPD tails and the
monthwise gamma fit follows its parametric tail.
rng = np.random.default_rng(0)
n_years = 20
n_months = n_years * 12
months = np.tile(np.arange(1, 13), n_years)
times = pd.date_range("1960-01-01", periods=n_months, freq="MS")
# Seasonal gamma with constant CV: mean peaks in February.
def gamma_params(m, cv=0.6):
seasonal_mean = 35.0 + 20.0 * np.cos(2.0 * np.pi * (m - 2) / 12.0)
shape = 1.0 / (cv ** 2)
return shape, seasonal_mean / shape
# Reference: stationary.
ref = np.array([rng.gamma(*gamma_params(m)) for m in months])
# Target: same seasonality, but the scale shrinks linearly to 65% of
# baseline by the end of the window (a drying climate).
trend = np.linspace(1.0, 0.65, n_months)
tgt = np.empty(n_months)
for i, m in enumerate(months):
shape, scale = gamma_params(m)
tgt[i] = rng.gamma(shape, scale * trend[i])
# Inject a few extreme drought events (~3% of months get scaled down
# by 0.15) - low enough to push values below the reference minimum.
drought_idx = rng.choice(n_months, size=8, replace=False)
tgt[drought_idx] *= 0.15
print(f"reference: {n_months} months, min={ref.min():.2f}, max={ref.max():.2f}")
print(f"target: {n_months} months, min={tgt.min():.2f}, max={tgt.max():.2f}")
print(f"target min is {'below' if tgt.min() < ref.min() else 'above'} reference min")
reference: 240 months, min=0.69, max=184.22 target: 240 months, min=0.87, max=137.69 target min is above reference min
2. Compute SPI-3 with each of the three methods¶
For every method we compute two things:
z_ref- reference standardised against itself (the baseline view of "normal" variability).z_tgt- target standardised against the fixed reference (the cross-series score that RSD is built for).
The four data arguments come first and may be positional; everything else is keyword-only so it cannot be silently mis-bound.
methods = [
("RSD", "rsd", {}),
("Monthwise ECDF", "monthwise_ecdf", {}),
("Monthwise gamma", "monthwise_parametric", {"distribution": "gamma"}),
]
z_ref, z_tgt = {}, {}
common = dict(scale=3, agg="sum")
for label, method, extra in methods:
z_ref[label] = rsd.standardize(
ref, None, months, None,
method=method, **common, **extra,
)
z_tgt[label] = rsd.standardize(
tgt, ref, months, months,
method=method, **common, **extra,
)
print(f"{'method':<18s} {'tgt min':>9s} {'tgt max':>9s}")
for label, _, _ in methods:
z = z_tgt[label]
print(f"{label:<18s} {np.nanmin(z):>+9.2f} {np.nanmax(z):>+9.2f}")
method tgt min tgt max RSD -2.93 +2.09 Monthwise ECDF -4.75 +1.91 Monthwise gamma -3.35 +2.19
3. Three-method comparison¶
One panel per method. In each panel:
- The solid line is the reference series, self-standardised.
- The dashed line is the target series, standardised against the fixed reference (lighter so it does not visually overpower the reference).
- The reference is filled with the method's colour when positive (wet) and with orange when negative (dry) - a consistent visual cue across panels.
- Grey dotted vertical lines mark every time point where the Monthwise ECDF target hits its rank-extreme floor (the lowest z-score that pathway can produce). These appear on all three panels so you can see what the other two methods report at the same instants.
method_colors = {
"RSD": "#d62728",
"Monthwise ECDF": "#2ca02c",
"Monthwise gamma": "#9467bd",
}
# Detect the rank-extreme floor of the Monthwise ECDF target. Real
# saturation (rather than a single extreme sample) is flagged when at
# least two time points share the minimum z-value within tolerance.
sdat_t = z_tgt["Monthwise ECDF"]
sdat_t_finite = sdat_t[np.isfinite(sdat_t)]
if sdat_t_finite.size:
floor = float(sdat_t_finite.min())
cap_mask = np.isclose(sdat_t, floor, atol=1e-3) & np.isfinite(sdat_t)
if int(cap_mask.sum()) < 2:
cap_mask = np.zeros_like(sdat_t, dtype=bool)
else:
cap_mask = np.zeros_like(sdat_t, dtype=bool)
cap_times = times[cap_mask]
fig, axes = plt.subplots(3, 1, figsize=(11, 7.5), sharex=True, sharey=True)
for ax, (label, _, _) in zip(axes, methods):
color = method_colors[label]
zr = z_ref[label]
zt = z_tgt[label]
for t in cap_times:
ax.axvline(t, color="grey", linestyle=":", linewidth=0.5, alpha=0.55, zorder=0)
ax.fill_between(times, 0, zr, where=zr > 0, color=color, alpha=0.25, interpolate=True)
ax.fill_between(times, 0, zr, where=zr < 0, color="orange", alpha=0.25, interpolate=True)
ax.plot(times, zr, color=color, linewidth=1.0, label="reference (self)")
ax.plot(times, zt, color=color, linewidth=1.0, linestyle="--", alpha=0.55,
label="target (vs reference)")
ax.axhline(0, color="k", linewidth=0.5)
ax.axhline(-1, color="grey", linewidth=0.3, linestyle="--")
ax.axhline( 1, color="grey", linewidth=0.3, linestyle="--")
ax.set_ylabel("SPI-3")
ax.set_title(label, loc="left", fontsize=11)
ax.set_ylim(-4, 4)
# Legend: single row, frameless, above the axes and right-aligned, so
# the panel title ("RSD" / etc., left-aligned) and the legend entries
# share the same horizontal band above the plot area.
ax.legend(
loc="lower right",
bbox_to_anchor=(1.01, 0.96),
ncol=2,
frameon=False,
fontsize=10,
handlelength=1.6,
columnspacing=1.2,
handletextpad=0.5,
)
axes[-1].xaxis.set_major_locator(mdates.YearLocator(2))
axes[-1].xaxis.set_major_formatter(mdates.DateFormatter("%Y"))
axes[-1].set_xlabel("year")
fig.suptitle(
"SPI-3 method comparison: stationary reference (solid) vs drying target (dashed)."
"\nDotted vertical lines mark Monthwise-ECDF rank-cap hits.",
y=1.00, fontsize=11,
)
fig.tight_layout()
plt.show()
What the figure shows¶
- RSD (top panel): the target dashed line goes well below
-2and reaches close to-3at the drought-event months. The GPD tail fit extrapolates smoothly beyond anything in the reference range. - Monthwise ECDF (middle panel): the target dashed line collapses
onto the same extreme floor (
z = -4.75) at every drought event and at every drying-trend month that falls below the reference minimum. The empirical CDF cannot rank a target value that is below the lowest reference value, so the probability is essentially zero and gets clipped to the package's probability epsilon, mapping toz = -4.75for every such point. Each dotted vertical line marks an instant where this happens; in the RSD panel above, the same instant gets a meaningful, distinct z-score. This collapse is the out-of-reference-support problem RSD was built to fix. - Monthwise gamma (bottom panel): the parametric tail follows the
per-month gamma into the extremes; values reach below
-3, but the per-month gamma fit on just 20 values per month is noisier than the pooled-tail RSD fit and is sensitive to the canonical-vs-free-loc convention choice (the package defaults to the canonicalfloc=0Stagge fit).
What next¶
- See
02_ssi_bounded_xarray.ipynbfor the N-D xarray wrapper and the bounded-variable (logit) pathway. - See
03_diagnostics_showcase.ipynbfor the one-call exchangeability diagnostic that verifies the RSD pooling assumption. - Full API reference at https://hydro-rsd.readthedocs.io.