Skip to content

Quick start

Two API surfaces, same vocabulary:

You have Use
A 1-D NumPy time series rsd.standardize
An N-D xarray DataArray with a time coordinate rsd.standardize_xr

In RSD vocabulary, the reference defines what "normal" looks like (a baseline period or dataset), and the target is the series you score against that normal. Output z is in standard-normal units: z ≈ 0 is climatology, |z| > 2 is extreme.

1-D

import numpy as np
import rsd

rng = np.random.default_rng(0)
months_ref = np.tile(np.arange(1, 13), 50)    # 600 months
months_tgt = np.tile(np.arange(1, 13), 100)   # 1200 months
ref = rng.gamma(shape=2, scale=5, size=600)
tgt = rng.gamma(shape=2, scale=5, size=1200)

z = rsd.standardize(
    target=tgt,
    reference=ref,
    months_target=months_tgt,
    months_reference=months_ref,
    scale=3,                                   # 3-month accumulation (SPI-3)
)

z is a float32 NumPy array the same length as target. The first scale - 1 positions are NaN (the rolling window is not yet full).

The four data arguments (target, reference, months_target, months_reference) may be passed positionally; every configuration argument after them (method, scale, agg, tail_quantile, ...) is keyword-only, so a positional mix-up like rsd.standardize(tgt, ref, m_tgt, m_ref, "rsd", 3) raises a clear TypeError instead of silently mis-binding the month arrays.

N-D (xarray + optional dask)

import rsd

z = rsd.standardize_xr(
    target=target_da,       # xr.DataArray with a "time" dimension
    reference=ref_da,       # xr.DataArray with a "time" dimension
    method="rsd",           # or "monthwise_ecdf" / "monthwise_parametric"
    scale=3,
    parallel=True,          # use dask for per-chunk parallelism
)

Months are extracted automatically from the time coordinate, so you do not pass months_target/months_reference here. The wrapper applies the chosen method independently to every spatial pixel along the time dimension via xr.apply_ufunc. Pre-chunk the input spatially (e.g. target_da.chunk({"lat": 60, "lon": 60, "time": -1})) and pass parallel=True for grids larger than a few thousand pixels.

Bounded variables (soil moisture)

For variables with two-sided physical bounds (soil moisture is the canonical case; relative humidity is another), pass per-timestep bounds to engage the logit pre-transform:

z = rsd.standardize_xr(
    target=target_da,
    bounds=(0.0, sm_sat_da),    # per-timestep bounds
    scale=3,
    agg="mean",                 # or "sum" - both work cleanly
)

The logit transform is applied before accumulation, so the choice of agg ("sum" or "mean") is a pure scientific decision. Each bound may be a scalar or a DataArray broadcastable against the input grid; in the example above, sm_sat_da would be a per-pixel saturation DataArray (e.g. xr.full_like(target_da.isel(time=0), 0.4) for a uniform field).

Compare against baselines

Two classical methods are bundled for direct comparison on the same input:

z_rsd   = rsd.standardize(..., method="rsd")                            # hybrid empirical + GPD tails
z_ecdf  = rsd.standardize(..., method="monthwise_ecdf")                 # classical SPI ECDF
z_gamma = rsd.standardize(..., method="monthwise_parametric",
                          distribution="gamma")                          # canonical Stagge SPI

The parametric path defaults to floc=0 (the canonical Stagge SPI convention) for distributions with non-negative support. Pass floc=None to recover scipy's default free-location 3-parameter fit.

Next

  • Read the method overview to understand why RSD looks the way it does and when to prefer it over the baselines.
  • Walk through the example notebooks for end-to-end runnable workflows.
  • Look up any function in the API reference.