03 - Diagnostics showcase: rsd.diagnose¶
Why this matters¶
The RSD method pools deseasonalised residuals from all twelve calendar months into a single sample so that there are enough exceedances to fit GPD tails. Pooling is only valid if, after removing per-month location and scale, the residual distributions are exchangeable across months - i.e., the higher moments (shape, skewness, tail behaviour) left in the residuals do not depend on the calendar month. Exchangeability is an assumption of the method that is worth checking on each new variable or pixel: appreciable departures from it can introduce bias in the pooled tail fit for the affected months.
rsd.diagnose(values, months, name, ...) is the one-call diagnostic
that tests this assumption on a reference series. A single call:
- Prints a configuration + extracted seasonal-parameters overview.
- Renders before / after deseasonalisation KDE panels.
- Prints an Anderson-Darling omnibus test plus per-month Kolmogorov-Smirnov leave-one-out tests of the deseasonalised residuals.
The two cases below are minimal synthetic illustrations: a well-behaved precipitation-like series where the test passes, and a deliberately constructed bounded series where it fails. The second case also demonstrates an optional logit pre-transform that can sometimes recover exchangeability for bounded variables; this is a helper, not a core part of the RSD pipeline.
import numpy as np
import rsd
print(f"rsd version: {rsd.__version__}")
rsd version: 0.1.dev0+d20260520
Case 1 - well-behaved series¶
Synthetic monthly precipitation-like values: a smooth sinusoidal
seasonal cycle drives a gamma distribution with constant coefficient
of variation. The seasonal signal is real but exchangeability holds by
construction (the gamma shape is the same across months; only the
scale moves with the season), so deseasonalisation produces residuals
that are exchangeable across months and diagnose confirms it.
rng = np.random.default_rng(0)
n_years = 50
months = np.tile(np.arange(1, 13), n_years)
# Sinusoidal monthly mean (peak in February, trough in August - a
# temperate-like winter-precipitation regime), constant coefficient of
# variation so the gamma shape parameter is month-independent.
# Residuals are exchangeable by construction.
def gamma_params(m, cv=0.6):
seasonal_mean = 35.0 + 20.0 * np.cos(2.0 * np.pi * (m - 2) / 12.0)
shape = 1.0 / (cv ** 2)
scale = seasonal_mean / shape
return shape, scale
values = np.array([rng.gamma(*gamma_params(m)) for m in months])
result_clean = rsd.diagnose(
values, months, name="synthetic precipitation (well-behaved)",
index="SPI", scale=3, agg="sum",
)
==============================================================================
Exchangeability summary
synthetic precipitation (well-behaved)
==============================================================================
Configuration:
Index : SPI-3
Aggregation : sum
Tail quantile : 0.10 (10.0% exceedance threshold)
Pooled n : 598
Monthly n : [49, 49, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50]
Seasonal parameters (extracted, baseline):
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Location 128.2 137.5 131.9 125.1 116.3 91.5 67.7 49.8 46.2 52.3 83.4 107.6
Spread 74.4 69.7 68.4 58.5 44.3 44.4 28.8 19.7 24.6 26.1 31.2 64.4
Statistical tests:
Anderson-Darling k-sample (omnibus):
Statistic = -1.8404, p-value = 0.2500
--> Cannot reject H0: all months from same distribution (at 5%)
KS test: month vs. pooled (leave-one-out)
Month n D-stat p-value Reject 5%
----------------------------------------------
Jan 49 0.1224 0.4707 .
Feb 49 0.0887 0.8376 .
Mar 50 0.0814 0.8951 .
Apr 50 0.0883 0.8336 .
May 50 0.1012 0.6951 .
Jun 50 0.0750 0.9403 .
Jul 50 0.0761 0.9333 .
Aug 50 0.1034 0.6685 .
Sep 50 0.0842 0.8723 .
Oct 50 0.0647 0.9835 .
Nov 50 0.0943 0.7709 .
Dec 50 0.0915 0.8029 .
Summary: 0/12 months reject at 5% (mean-D 0.0893, max-D 0.1224)
Legend: '*' = reject H0 at the given level, '.' = no reject.
Interpretation. The seasonal-parameters panel shows a smooth sinusoidal location and a proportionally varying spread (constant CV). The Anderson-Darling omnibus does not reject the null of exchangeability. Per-month KS tests are mostly non-significant. The 12 monthly KDE curves separate cleanly before deseasonalisation and collapse onto a single pooled distribution after. This is what "pooling is safe" looks like.
Case 2 - exchangeability failure on a bounded variable¶
A typical failure mode of the exchangeability assumption is a variable that is bounded by physics and reaches the bound in some seasons. For example, in a tropical pixel where soil moisture sits very close to saturation for several wet months, those months' residuals are compressed against the upper wall: their IQR is much smaller than dry months', and their shape is left-skewed in a way that pooling cannot absorb. The per-month higher moments differ from the bulk, and the test rejects.
We construct a synthetic soil-moisture series in [0, sat] whose wet
months genuinely saturate (target mean approaches 99 % of sat, with
a high beta-distribution concentration that collapses the spread
against the upper bound) and whose dry months are mid-range and
well-spread. A small fraction of occasional drought draws in the wet
season add a heavy left tail. The baseline diagnose call below is
expected to flag at least one wet-side month.
After that we re-run the same diagnostic with the optional
bounds=(L, U) argument, which applies a logit pre-transform before
deseasonalisation. The pre-transform is a separate helper layered on
top of the core RSD pipeline - it is not part of the method
itself. Whether it helps depends on the variable: for bounded
quantities like soil moisture it often does; for unbounded variables
it is not relevant. Adding auto_bounds=True additionally evaluates
a heuristic data-driven bound (rsd.estimate_bounds) as a robustness
check; physical bounds are always preferred when known.
rng = np.random.default_rng(1)
n_years = 50
months = np.tile(np.arange(1, 13), n_years)
sat = 0.4 # m^3/m^3, soil-moisture saturation
# Two synthesis ingredients combine to create a realistic saturating
# wet-season profile:
#
# (1) A Gaussian-shaped seasonal phase narrowly peaked at February.
# In wet months the Beta target mean climbs to 0.99 * sat and the
# concentration soars to 300, so the bulk of the distribution
# collapses against the saturation wall (IQR shrinks from ~ 11 %
# of sat in dry months to < 1 % in February). Dry months are
# Beta(~1.8, ~4.2): mean ~ 0.30 * sat, broad spread.
#
# (2) Occasional 'drought' draws (~ 8 % of wet-month samples) from a
# uniform across the lower range, producing the rare large
# excursions away from saturation that give wet-month
# distributions a heavy left tail. This is the feature that
# breaks exchangeability for the baseline pathway: after
# IQR-based deseasonalisation the wet months acquire a
# pronounced left skew that the rest of the year does not share.
def beta_params(m, sigma=2.0):
phase = np.exp(-((m - 2) / sigma) ** 2)
target_mean = 0.30 + 0.69 * phase
concentration = 6.0 + 294.0 * phase
return target_mean * concentration, (1 - target_mean) * concentration
values = np.empty(len(months))
for i, m in enumerate(months):
phase = np.exp(-((m - 2) / 2.0) ** 2)
if rng.random() < 0.08 * phase:
values[i] = sat * rng.uniform(0.05, 0.55)
else:
a, b = beta_params(m)
values[i] = sat * rng.beta(a, b)
result_bounded = rsd.diagnose(
values, months, name="synthetic soil moisture (saturating wet season)",
index="SSI", scale=3, agg="mean",
bounds=(0.0, sat),
auto_bounds=True,
)
==============================================================================
Exchangeability summary
synthetic soil moisture (saturating wet season)
==============================================================================
Configuration:
Index : SSI-3
Aggregation : mean
Tail quantile : 0.10 (10.0% exceedance threshold)
Pooled n : 598
Monthly n : [49, 49, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50]
Methods : baseline, physical-bounds (0, 0.4), auto-bounds (-0.0003869, 0.4068)
Seasonal parameters (extracted, baseline):
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Location 0.2 0.3 0.4 0.3 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.1
Spread 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.1 0.1 0.0
Statistical tests:
Anderson-Darling k-sample (omnibus):
baseline : statistic = +1.1899 p = 0.1181 --> Cannot reject (at 5%)
physical-bounds : statistic = -0.6337 p = 0.2500 --> Cannot reject (at 5%)
auto-bounds : statistic = -0.1965 p = 0.2500 --> Cannot reject (at 5%)
KS test: month vs. pooled (leave-one-out)
Month n | baseline | physical-bounds | auto-bounds
| D-stat p-val R 5% | D-stat p-val R 5% | D-stat p-val R 5%
------------+----------------------+----------------------+---------------------
Jan 49 | 0.1033 0.6810 . | 0.0635 0.9875 . | 0.0620 0.9904 .
Feb 49 | 0.0949 0.7750 . | 0.1482 0.2487 . | 0.0885 0.8391 .
Mar 50 | 0.2177 0.0218 * | 0.1399 0.2998 . | 0.1672 0.1366 .
Apr 50 | 0.1599 0.1706 . | 0.1272 0.4110 . | 0.1399 0.3002 .
May 50 | 0.0942 0.7727 . | 0.0685 0.9712 . | 0.0741 0.9451 .
Jun 50 | 0.0953 0.7614 . | 0.0915 0.8021 . | 0.0769 0.9284 .
Jul 50 | 0.0712 0.9606 . | 0.0566 0.9966 . | 0.0648 0.9831 .
Aug 50 | 0.0949 0.7664 . | 0.1077 0.6221 . | 0.1040 0.6639 .
Sep 50 | 0.1290 0.3942 . | 0.0926 0.7897 . | 0.0999 0.7094 .
Oct 50 | 0.1515 0.2189 . | 0.1788 0.0934 . | 0.1770 0.0993 .
Nov 50 | 0.1179 0.5062 . | 0.0931 0.7860 . | 0.0967 0.7464 .
Dec 50 | 0.1064 0.6356 . | 0.0809 0.8997 . | 0.0681 0.9730 .
Summary:
baseline : 1/12 reject at 5%, mean-D 0.1197, max-D 0.2177
physical-bounds : 0/12 reject at 5%, mean-D 0.1040, max-D 0.1788
auto-bounds : 0/12 reject at 5%, mean-D 0.1016, max-D 0.1770
Legend: '*' = reject H0 at the given level, '.' = no reject.
Interpretation. The seasonal-parameters panel makes the
compression visible: the location climbs to ~99 % of sat at the
February peak and the half-IQR ribbon visibly collapses to a thin
sliver against the wall (much narrower than in the broad
mid-summer months). The multi-method KS table then shows the same
story numerically:
- baseline rejects exchangeability for one or two wet-side months: the heavy left tail from drought draws, combined with the near-zero IQR, makes those months' residual CDFs visibly different from the pooled CDF.
- physical-bounds maps
[0, sat]into unbounded logit space before deseasonalisation; the compression at the saturation wall vanishes, the heavy left tail spreads out, and the same months stop rejecting (0/12). - auto-bounds lands on bounds that closely match the physical
ones because the data fills almost the entire
[0, sat]range, and recovers the same correction without the user supplying the physical bounds.
Prefer physical bounds whenever you know them: they encode an externally validated constraint instead of a heuristic estimate.
For multi-method calls, result_bounded['report'] is a dict keyed by
pathway label:
print("pathways evaluated:", list(result_bounded["report"].keys()))
print("bounds used by each pathway:", result_bounded["bounds_used"])
pathways evaluated: ['baseline', 'physical-bounds', 'auto-bounds']
bounds used by each pathway: {'physical': (0.0, 0.4), 'auto': (-0.0003868770832829759, 0.40681585307651824)}
What diagnose returns¶
The return value is a dict with the following keys (see the docstring for the full list):
overview- the printed header as a dict (name, pooled n, scale, agg, tail_quantile, methods evaluated).report- a flat dict (single-method) or a{label: dict}mapping (multi-method) of the Anderson-Darling and per-month KS test results.bounds_used- the bounds that were actually applied per pathway, including the auto-bound value ifauto_bounds=True.summary_fig,bounds_fig- the matplotlib Figure objects (or None whenquiet=True).
Use quiet=True for batch / CI runs - figures and reports still
compute, nothing prints to stdout.
Further reading¶
01_spi_basic.ipynb- the 1-Drsd.standardizeAPI, three methods on the same input.02_ssi_bounded_xarray.ipynb- the N-D xarray wrapper with the bounded-variable pathway.- Full API reference at https://hydro-rsd.readthedocs.io.