Skip to content

API reference

Every entry on this page is generated directly from the docstrings in src/rsd/. Edits to the source docstrings update this page on the next docs build.

The package exposes a flat top-level surface (rsd.standardize, rsd.standardize_xr, rsd.diagnose, rsd.estimate_bounds, rsd.logit_transform, rsd.inverse_logit_transform, rsd.METHODS, rsd.RSDFitWarning, rsd.RSDDegenerateWarning). The signatures below are documented at their source modules; importing them from the top-level package gives identical behaviour.

Core entry points

rsd.standardize

Compute a standardized hydroclimate index.

This is the main entry point for the RSD package. It supports three methods that share a common interface, making it easy to compare approaches on the same data.

The four data arguments (target, reference, months_target, months_reference) may be passed positionally; all configuration arguments after them are keyword-only to prevent silent positional mix-ups like standardize(tgt, ref, m_tgt, m_ref, "rsd", 3) which would otherwise assign month arrays to method and scale.

PARAMETER DESCRIPTION
target

The time series to standardize.

TYPE: 1-D float array

reference

The reference series used to build the CDF. If None, target is used as its own reference (self-standardization, intended for testing/single-series workflows only - it defeats the cross-series comparability that motivates the package).

TYPE: 1-D float array or None DEFAULT: None

months_target

Calendar month (1-12) for each element of target. Required.

TYPE: 1-D int array DEFAULT: None

months_reference

Calendar month (1-12) for each element of reference. Required when reference is provided separately. If None and reference is None, uses months_target.

TYPE: 1-D int array or None DEFAULT: None

method

Standardization method. Keyword-only.

TYPE: ('rsd', 'monthwise_ecdf', 'monthwise_parametric') DEFAULT: "rsd"

scale

Temporal accumulation window in time steps (default 3). Keyword-only.

TYPE: int DEFAULT: DEFAULT_SCALE

agg

Aggregation function within the window (default "sum"). Keyword-only.

TYPE: ('sum', 'mean') DEFAULT: "sum"

tail_quantile

(RSD only) Fraction of distribution in each tail (default 0.10).

TYPE: float DEFAULT: DEFAULT_TAIL_QUANTILE

min_tail_size

(RSD only) Minimum exceedances for GPD fitting (default 20).

TYPE: int DEFAULT: DEFAULT_MIN_TAIL_SIZE

loc

(RSD only) Location estimator for deseasonalization.

TYPE: ('median', 'mean') DEFAULT: "median"

scale_method

(RSD only) Scale estimator for deseasonalization.

TYPE: ('iqr', 'std') DEFAULT: "iqr"

distribution

(Parametric only) Name of a scipy.stats continuous distribution.

TYPE: str DEFAULT: DEFAULT_DISTRIBUTION

prob_zero

(Parametric only) Handle zero-inflation for distributions with support at or above 0.

TYPE: bool DEFAULT: False

floc

(Parametric only) Fixed location parameter for distributions with support at or above 0 (gamma, lognorm, expon, chi2, weibull_min). Default 0 matches the canonical Stagge SPI convention. Pass None to recover scipy's default 3-parameter free-location maximum likelihood fit.

TYPE: float or None DEFAULT: 0

plotting_position

Plotting position formula name or (a, b) constants.

TYPE: str or (float, float) DEFAULT: DEFAULT_PLOTTING_POSITION

bounds

(RSD only) Physical per-timestep bounds for a logit pre-transform. When provided, y = log((x - lower) / (upper - x)) is applied to each value of target and reference before accumulation, so the rest of the pipeline (accumulation, deseasonalization, pooling, tail fitting) operates in unbounded logit space. Either lower or upper (or both) may be arrays broadcastable against the input. None (default) skips the transform and operates in the original units. Only valid when method="rsd".

TYPE: (lower, upper) or None DEFAULT: None

RETURNS DESCRIPTION
ndarray(float32)

Standardized z-scores. Same length as target. Internal computations run in float64 throughout the pipeline; the output is cast to float32 for memory efficiency at the storage boundary.

Notes

Self-standardization (reference=None) is for testing and single-series workflows only. The motivating problem RSD was built to solve is cross-series comparison under a fixed reference distribution (factual vs counterfactual, historical vs future, model A vs model B). Using target as its own reference reduces the method to a self-standardized index and removes the cross-series differences the method is designed to preserve. Production calls should always provide an explicit reference.

Warnings. When the RSD method is selected and a tail fit fails or falls back, an :class:rsd.RSDFitWarning is emitted; when a calendar month in the reference is degenerate (zero spread), an :class:rsd.RSDDegenerateWarning is emitted and the affected positions become NaN downstream.

rsd.standardize_xr

Compute a standardized hydroclimate index over an N-D DataArray.

This wraps the 1-D kernels from :mod:rsd._kernels using xr.apply_ufunc with vectorize=True, applying the chosen method independently to each spatial pixel along the time dimension.

The two data arguments (target, reference) may be passed positionally; all configuration arguments after them are keyword-only.

PARAMETER DESCRIPTION
target

The DataArray to standardize. Must have a time dimension.

TYPE: DataArray

reference

Reference DataArray for building the CDF. If None, target is used as its own reference (self-standardization, intended for testing/single-series workflows only - it defeats the cross-series comparability that motivates the package).

TYPE: DataArray or None DEFAULT: None

method

Standardization method. Keyword-only.

TYPE: ('rsd', 'monthwise_ecdf', 'monthwise_parametric') DEFAULT: "rsd"

scale

Temporal accumulation window in time steps (default 3). Keyword-only.

TYPE: int DEFAULT: DEFAULT_SCALE

agg

Aggregation function within the window (default "sum"). Keyword-only.

TYPE: ('sum', 'mean') DEFAULT: "sum"

time_dim

Name of the time dimension (default "time"). Keyword-only.

TYPE: str DEFAULT: 'time'

tail_quantile

(RSD only) Fraction of distribution in each tail.

TYPE: float DEFAULT: DEFAULT_TAIL_QUANTILE

min_tail_size

(RSD only) Minimum exceedances for GPD fitting.

TYPE: int DEFAULT: DEFAULT_MIN_TAIL_SIZE

loc

(RSD only) Location estimator for deseasonalization.

TYPE: ('median', 'mean') DEFAULT: "median"

scale_method

(RSD only) Scale estimator for deseasonalization.

TYPE: ('iqr', 'std') DEFAULT: "iqr"

distribution

(Parametric only) Name of a scipy.stats continuous distribution.

TYPE: str DEFAULT: DEFAULT_DISTRIBUTION

prob_zero

(Parametric only) Handle zero-inflation.

TYPE: bool DEFAULT: False

floc

(Parametric only) Fixed location parameter for distributions with support at or above 0. Default 0 matches the canonical Stagge SPI convention. None recovers scipy's free-location 3-parameter fit.

TYPE: float or None DEFAULT: 0

plotting_position

Plotting position formula name or (a, b) constants.

TYPE: str or (float, float) DEFAULT: DEFAULT_PLOTTING_POSITION

parallel

If True, use dask parallelized scheduling. If False (default), run sequentially.

TYPE: bool DEFAULT: False

chunk_size

Spatial chunk size for auto-chunking when parallel=True and input is not already chunked (default 50).

TYPE: int DEFAULT: DEFAULT_CHUNK_SIZE

bounds

(RSD only) Physical per-timestep bounds for a logit pre-transform. When provided, y = log((x - lower) / (upper - x)) is applied to each value of target and reference before accumulation, so the rest of the pipeline runs in unbounded logit space and the choice of agg ('sum' or 'mean') is a pure scientific decision (both are well-defined in logit space). Each bound may be a scalar or a DataArray broadcastable against the input. Typical use: bounds=(0, sm_sat_da) for soil moisture, where sm_sat_da is the per-pixel saturation value. None (default) skips the transform. Only valid when method='rsd'.

TYPE: (lower, upper) or None DEFAULT: None

RETURNS DESCRIPTION
DataArray(float32)

Standardized z-scores with the same dimensions as target. Internal computations run in float64 throughout the pipeline; the output is cast to float32 for memory efficiency at the storage boundary.

Notes

Self-standardization (reference=None) is for testing and single-series workflows only. RSD's motivating use case is cross-series comparison under a fixed reference distribution (factual vs counterfactual, historical vs future, model A vs model B). Using target as its own reference reduces the method to a self-standardized index and removes the cross-series differences the method is designed to preserve. Production calls should always provide an explicit reference DataArray.

Warnings. As in :func:rsd.standardize, tail-fit failures emit :class:rsd.RSDFitWarning and degenerate-month references emit :class:rsd.RSDDegenerateWarning. On large grids these can flood the warning stream; suppress with warnings.filterwarnings("ignore", category=rsd.RSDFitWarning) if you trust the fallbacks.

Performance

The wrapper uses xr.apply_ufunc(..., vectorize=True), which runs the 1-D kernel in a Python-level loop over pixels per chunk. Throughput is therefore bound by the Python overhead per pixel multiplied by the number of pixels per chunk; expect minutes (not seconds) on a global 0.5° grid. For large grids:

  • Use parallel=True so dask runs the per-chunk loops in parallel across CPU cores.
  • Pre-chunk the input spatially (e.g., target.chunk({"lat": 60, "lon": 60, "time": -1})) so the time dimension stays contiguous within each chunk; the wrapper enforces this with :func:_rechunk_time_full but starting from sensible spatial chunks avoids extra rechunking.
  • For an end-to-end SSI-3 on a ~50k-land-pixel grid over a 20-year window, expect on the order of 10-15 minutes on a modern laptop with parallel=True.

Diagnostics

rsd.diagnose

One-call exchangeability diagnostic for an RSD reference series.

By default :func:diagnose prints an overview block (configuration + per-month seasonal parameters), renders the summary figure (location and scale on top, before / after deseasonalization KDEs below), and prints the Anderson-Darling omnibus + per-month Kolmogorov-Smirnov leave-one-out report.

Passing bounds=(L, U) adds a physical-bounds pathway that applies a logit pre-transform before accumulation; passing auto_bounds=True adds an auto-bounds pathway with bounds derived from :func:estimate_bounds. Either case produces a second figure (bounds_fig) that places baseline next to the bounded variants, and the report becomes multi-method.

The summary figure always shows the baseline (unbounded) location and scale and the baseline before / after KDEs. Logit-space location and scale are intentionally omitted from that figure: they do not have a direct physical interpretation. The bounded comparison lives in the separate bounds_fig.

PARAMETER DESCRIPTION
values

Reference time series (raw monthly values).

TYPE: 1-D float array

months

Calendar month (1 to 12) for each element of values.

TYPE: 1-D int array

name

Label for the series being diagnosed. Printed inside the overview header and used as the second line of figure suptitles. Required.

TYPE: str

index

Optional drought-index identifier (e.g. "SSI", "SPI", "SDI"). When provided the configuration block and figure suptitles show "{index}-{scale}" (e.g. "SSI-3").

TYPE: str or None DEFAULT: None

scale

Temporal accumulation window. Set to 1 if values are already accumulated.

TYPE: int DEFAULT: DEFAULT_SCALE

agg

Accumulation aggregator.

TYPE: ('sum', 'mean') DEFAULT: "sum"

loc

Location estimator for deseasonalization.

TYPE: ('median', 'mean') DEFAULT: "median"

scale_method

Scale estimator for deseasonalization.

TYPE: ('iqr', 'std') DEFAULT: "iqr"

bounds

Physical logit bounds. When supplied, the diagnostic adds a physical-bounds pathway alongside baseline and produces the bounded comparison figure.

TYPE: (lower, upper) or None DEFAULT: None

auto_bounds

If True, also add an auto-bounds pathway with bounds computed by :func:estimate_bounds.

TYPE: bool DEFAULT: False

tail_quantile

Exceedance threshold printed in the overview header.

TYPE: float DEFAULT: DEFAULT_TAIL_QUANTILE

alpha

Significance level for the per-month KS rejection flag.

TYPE: float DEFAULT: 0.05

quiet

Suppress all output: no overview or report is printed, figures are closed before returning, and the corresponding return-dict slots are set to None. The full numeric report dict is still returned. Useful in batch / CI runs that scan many pixels.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
dict

Keys:

  • overview: dict of the header values (name, index, pooled_n, monthly_n, scale, agg, tail_quantile, exceedance_pct, bounds_used, methods).
  • summary_fig: combined location-and-scale + before / after KDE figure. None when quiet=True.
  • bounds_fig: bounded-pathway comparison figure. None when neither bounds nor auto_bounds is requested, or when quiet=True.
  • report: single report dict (one method) or {label: report_dict} (multi-method). Each report dict carries ad_statistic, ad_pvalue, per_month, n_reject, n_months_tested, ks_d_mean, ks_d_max, pooled_n, alpha.
  • bounds_used: {"physical": (L, U) or None, "auto": (L, U) or None}.
Notes

When bounds=(L, U) is supplied with L >= min(values) or U <= max(values), :func:diagnose emits a UserWarning to flag the saturation or clipping risk; :func:rsd.transform.logit_transform may emit a separate warning of its own when it actually clips values to the open interval.

rsd.estimate_bounds

Estimate heuristic logit bounds from a sample's percentile range.

Returns (L, U) with L < min(values) and U > max(values), computed as the q-th and (100-q)-th percentiles padded by pad_frac times the inter-percentile range on each side. The returned bounds satisfy the strict inequalities required by :func:rsd.transform.logit_transform.

PARAMETER DESCRIPTION
values

Input series. NaNs are ignored.

TYPE: 1-D float array

q

Lower percentile in percent (default 0.5). Upper percentile is 100 - q.

TYPE: float DEFAULT: 0.5

pad_frac

Fractional pad added on each side, as a fraction of the inter-percentile range. Default 0.02 (2 %).

TYPE: float DEFAULT: 0.02

RETURNS DESCRIPTION
(lower, upper) : tuple of float

Heuristic logit bounds.

Notes

These are heuristic bounds derived from the observed sample, not physical bounds. Use them only when external (ancillary) physical bounds are unavailable.

Risks worth knowing:

  • Saturation amplification. If the input is already clipped against a physical limit (e.g. soil moisture against saturation), auto bounds will hug that limit and may amplify the saturation artefact rather than mitigate it. Prefer physical bounds in that case.
  • Sample dependence. Auto bounds depend on the data: two pixels with similar physics but different record lengths get different bounds. This is acceptable for diagnostics but unsuitable for production standardization, where bounds should be physically grounded and stable across samples.

To compare baseline, physical-bounds and auto-bounds pathways side by side, pass both bounds= and auto_bounds=True to :func:diagnose.

Helper transforms

rsd.logit_transform

Apply bounded logit transform: y = log((x - L) / (U - x)).

Maps the open interval (L, U) to (-∞, +∞) with position-dependent stretching - strong near the bounds, mild in the centre. This targets the compression-against-physical-bounds mechanism that causes month-dependent skewness in bounded hydroclimate variables such as soil moisture or discharge.

Values are clipped to the open interval (L + ε, U - ε) before the transform, where ε = eps_frac × (U - L). NaN values propagate as NaN.

PARAMETER DESCRIPTION
vals

Input values. Must be in (L, U) after clipping.

TYPE: array - like

lower

Physical lower bound (e.g., 0 for soil moisture or discharge).

TYPE: float

upper

Physical upper bound (e.g., saturation moisture content × depth × density).

TYPE: float

eps_frac

Fractional buffer from each bound before applying the log ratio. Default 1e-6.

TYPE: float DEFAULT: 1e-06

RETURNS DESCRIPTION
ndarray(float64)

Logit-transformed values. Same shape as vals.

Notes

This transform is only appropriate when the data genuinely approaches physical bounds. At pixels far from both bounds the transform has little effect. For drought indices the dry tail (where droughts live) should only be stretched when there is a physical lower bound nearby; stretching without physical basis distorts relative drought severity.

See inverse_logit_transform to invert if needed.

rsd.inverse_logit_transform

Invert the bounded logit transform: x = L + (U - L) / (1 + exp(-y)).

PARAMETER DESCRIPTION
y

Logit-space values (output of :func:logit_transform).

TYPE: array - like

lower

Physical lower bound used when transforming.

TYPE: float

upper

Physical upper bound used when transforming.

TYPE: float

RETURNS DESCRIPTION
ndarray(float64)

Values in the original (L, U) space.

Warning categories

The pipeline emits two custom UserWarning subclasses so silent failures become visible and filterable through the standard warnings machinery.

rsd.RSDFitWarning

A parametric fit (GPD, exponential, or scipy monthwise distribution) fell back or failed for a sample. The pipeline still returns a value using a documented fallback path (e.g., ECDF-only on GPD failure).

rsd.RSDDegenerateWarning

An input sample is degenerate (zero spread, constant series, or fully outside the declared physical bounds) and was handled by a fallback that may make downstream results undefined or NaN for the affected positions.

Method constant

rsd.METHODS is the tuple of method names accepted by the method= argument of standardize and standardize_xr:

>>> import rsd
>>> rsd.METHODS
('rsd', 'monthwise_ecdf', 'monthwise_parametric')

Stability guarantee

Everything documented on this page is covered by semantic versioning from v1.0.0 onward. Modules whose name starts with an underscore (_kernels, _validation, _warnings, ...) are private and may change between patch releases without notice.