Quick start¶
Two API surfaces, same vocabulary:
| You have | Use |
|---|---|
| A 1-D NumPy time series | rsd.standardize |
An N-D xarray DataArray with a time coordinate |
rsd.standardize_xr |
In RSD vocabulary, the reference defines what "normal" looks like (a
baseline period or dataset), and the target is the series you score
against that normal. Output z is in standard-normal units: z ≈ 0
is climatology, |z| > 2 is extreme.
1-D¶
import numpy as np
import rsd
rng = np.random.default_rng(0)
months_ref = np.tile(np.arange(1, 13), 50) # 600 months
months_tgt = np.tile(np.arange(1, 13), 100) # 1200 months
ref = rng.gamma(shape=2, scale=5, size=600)
tgt = rng.gamma(shape=2, scale=5, size=1200)
z = rsd.standardize(
target=tgt,
reference=ref,
months_target=months_tgt,
months_reference=months_ref,
scale=3, # 3-month accumulation (SPI-3)
)
z is a float32 NumPy array the same length as target. The first
scale - 1 positions are NaN (the rolling window is not yet full).
The four data arguments (target, reference, months_target,
months_reference) may be passed positionally; every configuration
argument after them (method, scale, agg, tail_quantile, ...) is
keyword-only, so a positional mix-up like
rsd.standardize(tgt, ref, m_tgt, m_ref, "rsd", 3) raises a clear
TypeError instead of silently mis-binding the month arrays.
N-D (xarray + optional dask)¶
import rsd
z = rsd.standardize_xr(
target=target_da, # xr.DataArray with a "time" dimension
reference=ref_da, # xr.DataArray with a "time" dimension
method="rsd", # or "monthwise_ecdf" / "monthwise_parametric"
scale=3,
parallel=True, # use dask for per-chunk parallelism
)
Months are extracted automatically from the time coordinate, so you do
not pass months_target/months_reference here. The wrapper applies
the chosen method independently to every spatial pixel along the time
dimension via xr.apply_ufunc. Pre-chunk the input spatially (e.g.
target_da.chunk({"lat": 60, "lon": 60, "time": -1})) and pass
parallel=True for grids larger than a few thousand pixels.
Bounded variables (soil moisture)¶
For variables with two-sided physical bounds (soil moisture is the canonical case; relative humidity is another), pass per-timestep bounds to engage the logit pre-transform:
z = rsd.standardize_xr(
target=target_da,
bounds=(0.0, sm_sat_da), # per-timestep bounds
scale=3,
agg="mean", # or "sum" - both work cleanly
)
The logit transform is applied before accumulation, so the choice of
agg ("sum" or "mean") is a pure scientific decision. Each bound may be
a scalar or a DataArray broadcastable against the input grid; in the
example above, sm_sat_da would be a per-pixel saturation DataArray
(e.g. xr.full_like(target_da.isel(time=0), 0.4) for a uniform field).
Compare against baselines¶
Two classical methods are bundled for direct comparison on the same input:
z_rsd = rsd.standardize(..., method="rsd") # hybrid empirical + GPD tails
z_ecdf = rsd.standardize(..., method="monthwise_ecdf") # classical SPI ECDF
z_gamma = rsd.standardize(..., method="monthwise_parametric",
distribution="gamma") # canonical Stagge SPI
The parametric path defaults to floc=0 (the canonical Stagge SPI
convention) for distributions with non-negative support. Pass floc=None
to recover scipy's default free-location 3-parameter fit.
Next¶
- Read the method overview to understand why RSD looks the way it does and when to prefer it over the baselines.
- Walk through the example notebooks for end-to-end runnable workflows.
- Look up any function in the API reference.