Quick start¶

Two API surfaces, same vocabulary:

You have	Use
A 1-D NumPy time series	`rsd.standardize`
An N-D xarray DataArray with a `time` coordinate	`rsd.standardize_xr`

In RSD vocabulary, the reference defines what "normal" looks like (a baseline period or dataset), and the target is the series you score against that normal. Output z is in standard-normal units: z ≈ 0 is climatology, |z| > 2 is extreme.

1-D¶

import numpy as np
import rsd

rng = np.random.default_rng(0)
months_ref = np.tile(np.arange(1, 13), 50)    # 600 months
months_tgt = np.tile(np.arange(1, 13), 100)   # 1200 months
ref = rng.gamma(shape=2, scale=5, size=600)
tgt = rng.gamma(shape=2, scale=5, size=1200)

z = rsd.standardize(
    target=tgt,
    reference=ref,
    months_target=months_tgt,
    months_reference=months_ref,
    scale=3,                                   # 3-month accumulation (SPI-3)
)

z is a float32 NumPy array the same length as target. The first scale - 1 positions are NaN (the rolling window is not yet full).

The four data arguments (target, reference, months_target, months_reference) may be passed positionally; every configuration argument after them (method, scale, agg, tail_quantile, ...) is keyword-only, so a positional mix-up like rsd.standardize(tgt, ref, m_tgt, m_ref, "rsd", 3) raises a clear TypeError instead of silently mis-binding the month arrays.

N-D (xarray + optional dask)¶

import rsd

z = rsd.standardize_xr(
    target=target_da,       # xr.DataArray with a "time" dimension
    reference=ref_da,       # xr.DataArray with a "time" dimension
    method="rsd",           # or "monthwise_ecdf" / "monthwise_parametric"
    scale=3,
    parallel=True,          # use dask for per-chunk parallelism
)

Months are extracted automatically from the time coordinate, so you do not pass months_target/months_reference here. The wrapper applies the chosen method independently to every spatial pixel along the time dimension via xr.apply_ufunc. Pre-chunk the input spatially (e.g. target_da.chunk({"lat": 60, "lon": 60, "time": -1})) and pass parallel=True for grids larger than a few thousand pixels.

Bounded variables (soil moisture)¶

For variables with two-sided physical bounds (soil moisture is the canonical case; relative humidity is another), pass per-timestep bounds to engage the logit pre-transform:

z = rsd.standardize_xr(
    target=target_da,
    bounds=(0.0, sm_sat_da),    # per-timestep bounds
    scale=3,
    agg="mean",                 # or "sum" - both work cleanly
)

The logit transform is applied before accumulation, so the choice of agg ("sum" or "mean") is a pure scientific decision. Each bound may be a scalar or a DataArray broadcastable against the input grid; in the example above, sm_sat_da would be a per-pixel saturation DataArray (e.g. xr.full_like(target_da.isel(time=0), 0.4) for a uniform field).

Compare against baselines¶

Two classical methods are bundled for direct comparison on the same input:

z_rsd   = rsd.standardize(..., method="rsd")                            # hybrid empirical + GPD tails
z_ecdf  = rsd.standardize(..., method="monthwise_ecdf")                 # classical SPI ECDF
z_gamma = rsd.standardize(..., method="monthwise_parametric",
                          distribution="gamma")                          # canonical Stagge SPI

The parametric path defaults to floc=0 (the canonical Stagge SPI convention) for distributions with non-negative support. Pass floc=None to recover scipy's default free-location 3-parameter fit.

Next¶

Read the method overview to understand why RSD looks the way it does and when to prefer it over the baselines.
Walk through the example notebooks for end-to-end runnable workflows.
Look up any function in the API reference.