01 - SPI basics with `rsd.standardize`¶

An end-to-end walkthrough of the 1-D API that mirrors the package's core motivation: standardise a target time series against a fixed reference distribution and compare what the three available methods produce on the same inputs.

We use 20 years of synthetic monthly precipitation for both a stationary reference and a drying-trend target with a few extreme drought events. The target deliberately reaches values below anything seen in the reference - the situation that motivated RSD in the first place. We then compute SPI-3 with each of the three methods and plot the reference (self-standardised) and target (standardised against the reference) side by side, one method per panel.

No external data; runs in a few seconds.

In [1]:

Copied!





import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

import rsd

print(f"rsd version: {rsd.__version__}")
print(f"available methods: {rsd.METHODS}")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

import rsd

print(f"rsd version: {rsd.__version__}")
print(f"available methods: {rsd.METHODS}")

rsd version: 0.1.dev0+d20260520
available methods: ('rsd', 'monthwise_ecdf', 'monthwise_parametric')

1. Build a stationary reference and a drying-trend target¶

Both series share the same 20-year monthly time axis. The reference is a stationary seasonal gamma with constant coefficient of variation - no trend, no shocks. The target uses the same seasonal cycle but its mean shrinks linearly over the 20 years (a drying climate), and a handful of months get extreme drought multipliers so the lowest target values fall below the lowest value the reference ever sees.

That asymmetry is the point: it forces monthwise_ecdf (the classical empirical baseline, sometimes called SDAT after Farahmand and AghaKouchak, 2015) to hit its rank-extreme floor and stop responding, while the RSD method extrapolates through its GPD tails and the monthwise gamma fit follows its parametric tail.

In [2]:

Copied!





rng = np.random.default_rng(0)
n_years = 20
n_months = n_years * 12
months = np.tile(np.arange(1, 13), n_years)
times = pd.date_range("1960-01-01", periods=n_months, freq="MS")

# Seasonal gamma with constant CV: mean peaks in February.
def gamma_params(m, cv=0.6):
    seasonal_mean = 35.0 + 20.0 * np.cos(2.0 * np.pi * (m - 2) / 12.0)
    shape = 1.0 / (cv ** 2)
    return shape, seasonal_mean / shape

# Reference: stationary.
ref = np.array([rng.gamma(*gamma_params(m)) for m in months])

# Target: same seasonality, but the scale shrinks linearly to 65% of
# baseline by the end of the window (a drying climate).
trend = np.linspace(1.0, 0.65, n_months)
tgt = np.empty(n_months)
for i, m in enumerate(months):
    shape, scale = gamma_params(m)
    tgt[i] = rng.gamma(shape, scale * trend[i])

# Inject a few extreme drought events (~3% of months get scaled down
# by 0.15) - low enough to push values below the reference minimum.
drought_idx = rng.choice(n_months, size=8, replace=False)
tgt[drought_idx] *= 0.15

print(f"reference: {n_months} months, min={ref.min():.2f}, max={ref.max():.2f}")
print(f"target:    {n_months} months, min={tgt.min():.2f}, max={tgt.max():.2f}")
print(f"target min is {'below' if tgt.min() < ref.min() else 'above'} reference min")
rng = np.random.default_rng(0)
n_years = 20
n_months = n_years * 12
months = np.tile(np.arange(1, 13), n_years)
times = pd.date_range("1960-01-01", periods=n_months, freq="MS")

# Seasonal gamma with constant CV: mean peaks in February.
def gamma_params(m, cv=0.6):
    seasonal_mean = 35.0 + 20.0 * np.cos(2.0 * np.pi * (m - 2) / 12.0)
    shape = 1.0 / (cv ** 2)
    return shape, seasonal_mean / shape

# Reference: stationary.
ref = np.array([rng.gamma(*gamma_params(m)) for m in months])

# Target: same seasonality, but the scale shrinks linearly to 65% of
# baseline by the end of the window (a drying climate).
trend = np.linspace(1.0, 0.65, n_months)
tgt = np.empty(n_months)
for i, m in enumerate(months):
    shape, scale = gamma_params(m)
    tgt[i] = rng.gamma(shape, scale * trend[i])

# Inject a few extreme drought events (~3% of months get scaled down
# by 0.15) - low enough to push values below the reference minimum.
drought_idx = rng.choice(n_months, size=8, replace=False)
tgt[drought_idx] *= 0.15

print(f"reference: {n_months} months, min={ref.min():.2f}, max={ref.max():.2f}")
print(f"target:    {n_months} months, min={tgt.min():.2f}, max={tgt.max():.2f}")
print(f"target min is {'below' if tgt.min() < ref.min() else 'above'} reference min")

reference: 240 months, min=0.69, max=184.22
target:    240 months, min=0.87, max=137.69
target min is above reference min

2. Compute SPI-3 with each of the three methods¶

For every method we compute two things:

z_ref - reference standardised against itself (the baseline view of "normal" variability).
z_tgt - target standardised against the fixed reference (the cross-series score that RSD is built for).

The four data arguments come first and may be positional; everything else is keyword-only so it cannot be silently mis-bound.

In [3]:

Copied!





methods = [
    ("RSD",                  "rsd",                  {}),
    ("Monthwise ECDF",       "monthwise_ecdf",       {}),
    ("Monthwise gamma",      "monthwise_parametric", {"distribution": "gamma"}),
]

z_ref, z_tgt = {}, {}
common = dict(scale=3, agg="sum")
for label, method, extra in methods:
    z_ref[label] = rsd.standardize(
        ref, None, months, None,
        method=method, **common, **extra,
    )
    z_tgt[label] = rsd.standardize(
        tgt, ref, months, months,
        method=method, **common, **extra,
    )

print(f"{'method':<18s} {'tgt min':>9s} {'tgt max':>9s}")
for label, _, _ in methods:
    z = z_tgt[label]
    print(f"{label:<18s} {np.nanmin(z):>+9.2f} {np.nanmax(z):>+9.2f}")
methods = [
    ("RSD",                  "rsd",                  {}),
    ("Monthwise ECDF",       "monthwise_ecdf",       {}),
    ("Monthwise gamma",      "monthwise_parametric", {"distribution": "gamma"}),
]

z_ref, z_tgt = {}, {}
common = dict(scale=3, agg="sum")
for label, method, extra in methods:
    z_ref[label] = rsd.standardize(
        ref, None, months, None,
        method=method, **common, **extra,
    )
    z_tgt[label] = rsd.standardize(
        tgt, ref, months, months,
        method=method, **common, **extra,
    )

print(f"{'method':<18s} {'tgt min':>9s} {'tgt max':>9s}")
for label, _, _ in methods:
    z = z_tgt[label]
    print(f"{label:<18s} {np.nanmin(z):>+9.2f} {np.nanmax(z):>+9.2f}")

method               tgt min   tgt max
RSD                    -2.93     +2.09
Monthwise ECDF         -4.75     +1.91
Monthwise gamma        -3.35     +2.19

3. Three-method comparison¶

One panel per method. In each panel:

The solid line is the reference series, self-standardised.
The dashed line is the target series, standardised against the fixed reference (lighter so it does not visually overpower the reference).
The reference is filled with the method's colour when positive (wet) and with orange when negative (dry) - a consistent visual cue across panels.
Grey dotted vertical lines mark every time point where the Monthwise ECDF target hits its rank-extreme floor (the lowest z-score that pathway can produce). These appear on all three panels so you can see what the other two methods report at the same instants.

In [4]:

Copied!





method_colors = {
    "RSD":             "#d62728",
    "Monthwise ECDF":  "#2ca02c",
    "Monthwise gamma": "#9467bd",
}

# Detect the rank-extreme floor of the Monthwise ECDF target. Real
# saturation (rather than a single extreme sample) is flagged when at
# least two time points share the minimum z-value within tolerance.
sdat_t = z_tgt["Monthwise ECDF"]
sdat_t_finite = sdat_t[np.isfinite(sdat_t)]
if sdat_t_finite.size:
    floor = float(sdat_t_finite.min())
    cap_mask = np.isclose(sdat_t, floor, atol=1e-3) & np.isfinite(sdat_t)
    if int(cap_mask.sum()) < 2:
        cap_mask = np.zeros_like(sdat_t, dtype=bool)
else:
    cap_mask = np.zeros_like(sdat_t, dtype=bool)
cap_times = times[cap_mask]

fig, axes = plt.subplots(3, 1, figsize=(11, 7.5), sharex=True, sharey=True)

for ax, (label, _, _) in zip(axes, methods):
    color = method_colors[label]
    zr = z_ref[label]
    zt = z_tgt[label]

    for t in cap_times:
        ax.axvline(t, color="grey", linestyle=":", linewidth=0.5, alpha=0.55, zorder=0)

    ax.fill_between(times, 0, zr, where=zr > 0, color=color,   alpha=0.25, interpolate=True)
    ax.fill_between(times, 0, zr, where=zr < 0, color="orange", alpha=0.25, interpolate=True)
    ax.plot(times, zr, color=color, linewidth=1.0, label="reference (self)")
    ax.plot(times, zt, color=color, linewidth=1.0, linestyle="--", alpha=0.55,
            label="target (vs reference)")

    ax.axhline(0,  color="k",    linewidth=0.5)
    ax.axhline(-1, color="grey", linewidth=0.3, linestyle="--")
    ax.axhline( 1, color="grey", linewidth=0.3, linestyle="--")
    ax.set_ylabel("SPI-3")
    ax.set_title(label, loc="left", fontsize=11)
    ax.set_ylim(-4, 4)
    # Legend: single row, frameless, above the axes and right-aligned, so
    # the panel title ("RSD" / etc., left-aligned) and the legend entries
    # share the same horizontal band above the plot area.
    ax.legend(
        loc="lower right",
        bbox_to_anchor=(1.01, 0.96),
        ncol=2,
        frameon=False,
        fontsize=10,
        handlelength=1.6,
        columnspacing=1.2,
        handletextpad=0.5,
    )

axes[-1].xaxis.set_major_locator(mdates.YearLocator(2))
axes[-1].xaxis.set_major_formatter(mdates.DateFormatter("%Y"))
axes[-1].set_xlabel("year")

fig.suptitle(
    "SPI-3 method comparison: stationary reference (solid) vs drying target (dashed)."
    "\nDotted vertical lines mark Monthwise-ECDF rank-cap hits.",
    y=1.00, fontsize=11,
)
fig.tight_layout()
plt.show()
method_colors = {
    "RSD":             "#d62728",
    "Monthwise ECDF":  "#2ca02c",
    "Monthwise gamma": "#9467bd",
}

# Detect the rank-extreme floor of the Monthwise ECDF target. Real
# saturation (rather than a single extreme sample) is flagged when at
# least two time points share the minimum z-value within tolerance.
sdat_t = z_tgt["Monthwise ECDF"]
sdat_t_finite = sdat_t[np.isfinite(sdat_t)]
if sdat_t_finite.size:
    floor = float(sdat_t_finite.min())
    cap_mask = np.isclose(sdat_t, floor, atol=1e-3) & np.isfinite(sdat_t)
    if int(cap_mask.sum()) < 2:
        cap_mask = np.zeros_like(sdat_t, dtype=bool)
else:
    cap_mask = np.zeros_like(sdat_t, dtype=bool)
cap_times = times[cap_mask]

fig, axes = plt.subplots(3, 1, figsize=(11, 7.5), sharex=True, sharey=True)

for ax, (label, _, _) in zip(axes, methods):
    color = method_colors[label]
    zr = z_ref[label]
    zt = z_tgt[label]

    for t in cap_times:
        ax.axvline(t, color="grey", linestyle=":", linewidth=0.5, alpha=0.55, zorder=0)

    ax.fill_between(times, 0, zr, where=zr > 0, color=color,   alpha=0.25, interpolate=True)
    ax.fill_between(times, 0, zr, where=zr < 0, color="orange", alpha=0.25, interpolate=True)
    ax.plot(times, zr, color=color, linewidth=1.0, label="reference (self)")
    ax.plot(times, zt, color=color, linewidth=1.0, linestyle="--", alpha=0.55,
            label="target (vs reference)")

    ax.axhline(0,  color="k",    linewidth=0.5)
    ax.axhline(-1, color="grey", linewidth=0.3, linestyle="--")
    ax.axhline( 1, color="grey", linewidth=0.3, linestyle="--")
    ax.set_ylabel("SPI-3")
    ax.set_title(label, loc="left", fontsize=11)
    ax.set_ylim(-4, 4)
    # Legend: single row, frameless, above the axes and right-aligned, so
    # the panel title ("RSD" / etc., left-aligned) and the legend entries
    # share the same horizontal band above the plot area.
    ax.legend(
        loc="lower right",
        bbox_to_anchor=(1.01, 0.96),
        ncol=2,
        frameon=False,
        fontsize=10,
        handlelength=1.6,
        columnspacing=1.2,
        handletextpad=0.5,
    )

axes[-1].xaxis.set_major_locator(mdates.YearLocator(2))
axes[-1].xaxis.set_major_formatter(mdates.DateFormatter("%Y"))
axes[-1].set_xlabel("year")

fig.suptitle(
    "SPI-3 method comparison: stationary reference (solid) vs drying target (dashed)."
    "\nDotted vertical lines mark Monthwise-ECDF rank-cap hits.",
    y=1.00, fontsize=11,
)
fig.tight_layout()
plt.show()

No description has been provided for this image

What the figure shows¶

RSD (top panel): the target dashed line goes well below -2 and reaches close to -3 at the drought-event months. The GPD tail fit extrapolates smoothly beyond anything in the reference range.
Monthwise ECDF (middle panel): the target dashed line collapses onto the same extreme floor (z = -4.75) at every drought event and at every drying-trend month that falls below the reference minimum. The empirical CDF cannot rank a target value that is below the lowest reference value, so the probability is essentially zero and gets clipped to the package's probability epsilon, mapping to z = -4.75 for every such point. Each dotted vertical line marks an instant where this happens; in the RSD panel above, the same instant gets a meaningful, distinct z-score. This collapse is the out-of-reference-support problem RSD was built to fix.
Monthwise gamma (bottom panel): the parametric tail follows the per-month gamma into the extremes; values reach below -3, but the per-month gamma fit on just 20 values per month is noisier than the pooled-tail RSD fit and is sensitive to the canonical-vs-free-loc convention choice (the package defaults to the canonical floc=0 Stagge fit).

What next¶

See 02_ssi_bounded_xarray.ipynb for the N-D xarray wrapper and the bounded-variable (logit) pathway.
See 03_diagnostics_showcase.ipynb for the one-call exchangeability diagnostic that verifies the RSD pooling assumption.
Full API reference at https://hydro-rsd.readthedocs.io.

01 - SPI basics with rsd.standardize¶