API reference¶

Every entry on this page is generated directly from the docstrings in src/rsd/. Edits to the source docstrings update this page on the next docs build.

The package exposes a flat top-level surface (rsd.standardize, rsd.standardize_xr, rsd.diagnose, rsd.estimate_bounds, rsd.logit_transform, rsd.inverse_logit_transform, rsd.METHODS, rsd.RSDFitWarning, rsd.RSDDegenerateWarning). The signatures below are documented at their source modules; importing them from the top-level package gives identical behaviour.

Core entry points¶

`rsd.standardize`¶

Compute a standardized hydroclimate index.

This is the main entry point for the RSD package. It supports three methods that share a common interface, making it easy to compare approaches on the same data.

The four data arguments (target, reference, months_target, months_reference) may be passed positionally; all configuration arguments after them are keyword-only to prevent silent positional mix-ups like standardize(tgt, ref, m_tgt, m_ref, "rsd", 3) which would otherwise assign month arrays to method and scale.

PARAMETER	DESCRIPTION
`target`	The time series to standardize. TYPE: `1-D float array`
`reference`	The reference series used to build the CDF. If `None`, target is used as its own reference (self-standardization, intended for testing/single-series workflows only - it defeats the cross-series comparability that motivates the package). TYPE: `1-D float array or None` DEFAULT: `None`
`months_target`	Calendar month (1-12) for each element of target. Required. TYPE: `1-D int array` DEFAULT: `None`
`months_reference`	Calendar month (1-12) for each element of reference. Required when reference is provided separately. If `None` and reference is `None`, uses months_target. TYPE: `1-D int array or None` DEFAULT: `None`
`method`	Standardization method. Keyword-only. TYPE: `('rsd', 'monthwise_ecdf', 'monthwise_parametric')` DEFAULT: `"rsd"`
`scale`	Temporal accumulation window in time steps (default 3). Keyword-only. TYPE: `int` DEFAULT: `DEFAULT_SCALE`
`agg`	Aggregation function within the window (default "sum"). Keyword-only. TYPE: `('sum', 'mean')` DEFAULT: `"sum"`
`tail_quantile`	(RSD only) Fraction of distribution in each tail (default 0.10). TYPE: `float` DEFAULT: `DEFAULT_TAIL_QUANTILE`
`min_tail_size`	(RSD only) Minimum exceedances for GPD fitting (default 20). TYPE: `int` DEFAULT: `DEFAULT_MIN_TAIL_SIZE`
`loc`	(RSD only) Location estimator for deseasonalization. TYPE: `('median', 'mean')` DEFAULT: `"median"`
`scale_method`	(RSD only) Scale estimator for deseasonalization. TYPE: `('iqr', 'std')` DEFAULT: `"iqr"`
`distribution`	(Parametric only) Name of a scipy.stats continuous distribution. TYPE: `str` DEFAULT: `DEFAULT_DISTRIBUTION`
`prob_zero`	(Parametric only) Handle zero-inflation for distributions with support at or above 0. TYPE: `bool` DEFAULT: `False`
`floc`	(Parametric only) Fixed location parameter for distributions with support at or above 0 (gamma, lognorm, expon, chi2, weibull_min). Default `0` matches the canonical Stagge SPI convention. Pass `None` to recover scipy's default 3-parameter free-location maximum likelihood fit. TYPE: `float or None` DEFAULT: `0`
`plotting_position`	Plotting position formula name or (a, b) constants. TYPE: `str or (float, float)` DEFAULT: `DEFAULT_PLOTTING_POSITION`
`bounds`	(RSD only) Physical per-timestep bounds for a logit pre-transform. When provided, `y = log((x - lower) / (upper - x))` is applied to each value of target and reference before accumulation, so the rest of the pipeline (accumulation, deseasonalization, pooling, tail fitting) operates in unbounded logit space. Either lower or upper (or both) may be arrays broadcastable against the input. `None` (default) skips the transform and operates in the original units. Only valid when `method="rsd"`. TYPE: `(lower, upper) or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`ndarray(float32)`	Standardized z-scores. Same length as target. Internal computations run in float64 throughout the pipeline; the output is cast to float32 for memory efficiency at the storage boundary.

Notes

Self-standardization (reference=None) is for testing and single-series workflows only. The motivating problem RSD was built to solve is cross-series comparison under a fixed reference distribution (factual vs counterfactual, historical vs future, model A vs model B). Using target as its own reference reduces the method to a self-standardized index and removes the cross-series differences the method is designed to preserve. Production calls should always provide an explicit reference.

Warnings. When the RSD method is selected and a tail fit fails or falls back, an :class:rsd.RSDFitWarning is emitted; when a calendar month in the reference is degenerate (zero spread), an :class:rsd.RSDDegenerateWarning is emitted and the affected positions become NaN downstream.

`rsd.standardize_xr`¶

Compute a standardized hydroclimate index over an N-D DataArray.

This wraps the 1-D kernels from :mod:rsd._kernels using xr.apply_ufunc with vectorize=True, applying the chosen method independently to each spatial pixel along the time dimension.

The two data arguments (target, reference) may be passed positionally; all configuration arguments after them are keyword-only.

PARAMETER	DESCRIPTION
`target`	The DataArray to standardize. Must have a time dimension. TYPE: `DataArray`
`reference`	Reference DataArray for building the CDF. If `None`, target is used as its own reference (self-standardization, intended for testing/single-series workflows only - it defeats the cross-series comparability that motivates the package). TYPE: `DataArray or None` DEFAULT: `None`
`method`	Standardization method. Keyword-only. TYPE: `('rsd', 'monthwise_ecdf', 'monthwise_parametric')` DEFAULT: `"rsd"`
`scale`	Temporal accumulation window in time steps (default 3). Keyword-only. TYPE: `int` DEFAULT: `DEFAULT_SCALE`
`agg`	Aggregation function within the window (default "sum"). Keyword-only. TYPE: `('sum', 'mean')` DEFAULT: `"sum"`
`time_dim`	Name of the time dimension (default "time"). Keyword-only. TYPE: `str` DEFAULT: `'time'`
`tail_quantile`	(RSD only) Fraction of distribution in each tail. TYPE: `float` DEFAULT: `DEFAULT_TAIL_QUANTILE`
`min_tail_size`	(RSD only) Minimum exceedances for GPD fitting. TYPE: `int` DEFAULT: `DEFAULT_MIN_TAIL_SIZE`
`loc`	(RSD only) Location estimator for deseasonalization. TYPE: `('median', 'mean')` DEFAULT: `"median"`
`scale_method`	(RSD only) Scale estimator for deseasonalization. TYPE: `('iqr', 'std')` DEFAULT: `"iqr"`
`distribution`	(Parametric only) Name of a scipy.stats continuous distribution. TYPE: `str` DEFAULT: `DEFAULT_DISTRIBUTION`
`prob_zero`	(Parametric only) Handle zero-inflation. TYPE: `bool` DEFAULT: `False`
`floc`	(Parametric only) Fixed location parameter for distributions with support at or above 0. Default `0` matches the canonical Stagge SPI convention. `None` recovers scipy's free-location 3-parameter fit. TYPE: `float or None` DEFAULT: `0`
`plotting_position`	Plotting position formula name or (a, b) constants. TYPE: `str or (float, float)` DEFAULT: `DEFAULT_PLOTTING_POSITION`
`parallel`	If True, use dask parallelized scheduling. If False (default), run sequentially. TYPE: `bool` DEFAULT: `False`
`chunk_size`	Spatial chunk size for auto-chunking when parallel=True and input is not already chunked (default 50). TYPE: `int` DEFAULT: `DEFAULT_CHUNK_SIZE`
`bounds`	(RSD only) Physical per-timestep bounds for a logit pre-transform. When provided, `y = log((x - lower) / (upper - x))` is applied to each value of target and reference before accumulation, so the rest of the pipeline runs in unbounded logit space and the choice of `agg` ('sum' or 'mean') is a pure scientific decision (both are well-defined in logit space). Each bound may be a scalar or a DataArray broadcastable against the input. Typical use: `bounds=(0, sm_sat_da)` for soil moisture, where `sm_sat_da` is the per-pixel saturation value. `None` (default) skips the transform. Only valid when `method='rsd'`. TYPE: `(lower, upper) or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataArray(float32)`	Standardized z-scores with the same dimensions as target. Internal computations run in float64 throughout the pipeline; the output is cast to float32 for memory efficiency at the storage boundary.

Notes

Self-standardization (reference=None) is for testing and single-series workflows only. RSD's motivating use case is cross-series comparison under a fixed reference distribution (factual vs counterfactual, historical vs future, model A vs model B). Using target as its own reference reduces the method to a self-standardized index and removes the cross-series differences the method is designed to preserve. Production calls should always provide an explicit reference DataArray.

Warnings. As in :func:rsd.standardize, tail-fit failures emit :class:rsd.RSDFitWarning and degenerate-month references emit :class:rsd.RSDDegenerateWarning. On large grids these can flood the warning stream; suppress with warnings.filterwarnings("ignore", category=rsd.RSDFitWarning) if you trust the fallbacks.

Performance

The wrapper uses xr.apply_ufunc(..., vectorize=True), which runs the 1-D kernel in a Python-level loop over pixels per chunk. Throughput is therefore bound by the Python overhead per pixel multiplied by the number of pixels per chunk; expect minutes (not seconds) on a global 0.5° grid. For large grids:

Use parallel=True so dask runs the per-chunk loops in parallel across CPU cores.
Pre-chunk the input spatially (e.g., target.chunk({"lat": 60, "lon": 60, "time": -1})) so the time dimension stays contiguous within each chunk; the wrapper enforces this with :func:_rechunk_time_full but starting from sensible spatial chunks avoids extra rechunking.
For an end-to-end SSI-3 on a ~50k-land-pixel grid over a 20-year window, expect on the order of 10-15 minutes on a modern laptop with parallel=True.

Diagnostics¶

`rsd.diagnose`¶

One-call exchangeability diagnostic for an RSD reference series.

By default :func:diagnose prints an overview block (configuration + per-month seasonal parameters), renders the summary figure (location and scale on top, before / after deseasonalization KDEs below), and prints the Anderson-Darling omnibus + per-month Kolmogorov-Smirnov leave-one-out report.

Passing bounds=(L, U) adds a physical-bounds pathway that applies a logit pre-transform before accumulation; passing auto_bounds=True adds an auto-bounds pathway with bounds derived from :func:estimate_bounds. Either case produces a second figure (bounds_fig) that places baseline next to the bounded variants, and the report becomes multi-method.

The summary figure always shows the baseline (unbounded) location and scale and the baseline before / after KDEs. Logit-space location and scale are intentionally omitted from that figure: they do not have a direct physical interpretation. The bounded comparison lives in the separate bounds_fig.

PARAMETER	DESCRIPTION
`values`	Reference time series (raw monthly values). TYPE: `1-D float array`
`months`	Calendar month (1 to 12) for each element of `values`. TYPE: `1-D int array`
`name`	Label for the series being diagnosed. Printed inside the overview header and used as the second line of figure suptitles. Required. TYPE: `str`
`index`	Optional drought-index identifier (e.g. `"SSI"`, `"SPI"`, `"SDI"`). When provided the configuration block and figure suptitles show `"{index}-{scale}"` (e.g. `"SSI-3"`). TYPE: `str or None` DEFAULT: `None`
`scale`	Temporal accumulation window. Set to 1 if `values` are already accumulated. TYPE: `int` DEFAULT: `DEFAULT_SCALE`
`agg`	Accumulation aggregator. TYPE: `('sum', 'mean')` DEFAULT: `"sum"`
`loc`	Location estimator for deseasonalization. TYPE: `('median', 'mean')` DEFAULT: `"median"`
`scale_method`	Scale estimator for deseasonalization. TYPE: `('iqr', 'std')` DEFAULT: `"iqr"`
`bounds`	Physical logit bounds. When supplied, the diagnostic adds a `physical-bounds` pathway alongside `baseline` and produces the bounded comparison figure. TYPE: `(lower, upper) or None` DEFAULT: `None`
`auto_bounds`	If True, also add an `auto-bounds` pathway with bounds computed by :func:`estimate_bounds`. TYPE: `bool` DEFAULT: `False`
`tail_quantile`	Exceedance threshold printed in the overview header. TYPE: `float` DEFAULT: `DEFAULT_TAIL_QUANTILE`
`alpha`	Significance level for the per-month KS rejection flag. TYPE: `float` DEFAULT: `0.05`
`quiet`	Suppress all output: no overview or report is printed, figures are closed before returning, and the corresponding return-dict slots are set to `None`. The full numeric report dict is still returned. Useful in batch / CI runs that scan many pixels. TYPE: `bool` DEFAULT: `False`

RETURNS DESCRIPTION

dict

Keys:

overview: dict of the header values (name, index, pooled_n, monthly_n, scale, agg, tail_quantile, exceedance_pct, bounds_used, methods).
summary_fig: combined location-and-scale + before / after KDE figure. None when quiet=True.
bounds_fig: bounded-pathway comparison figure. None when neither bounds nor auto_bounds is requested, or when quiet=True.
report: single report dict (one method) or {label: report_dict} (multi-method). Each report dict carries ad_statistic, ad_pvalue, per_month, n_reject, n_months_tested, ks_d_mean, ks_d_max, pooled_n, alpha.
bounds_used: {"physical": (L, U) or None, "auto": (L, U) or None}.

Notes

When bounds=(L, U) is supplied with L >= min(values) or U <= max(values), :func:diagnose emits a UserWarning to flag the saturation or clipping risk; :func:rsd.transform.logit_transform may emit a separate warning of its own when it actually clips values to the open interval.

`rsd.estimate_bounds`¶

Estimate heuristic logit bounds from a sample's percentile range.

Returns (L, U) with L < min(values) and U > max(values), computed as the q-th and (100-q)-th percentiles padded by pad_frac times the inter-percentile range on each side. The returned bounds satisfy the strict inequalities required by :func:rsd.transform.logit_transform.

PARAMETER	DESCRIPTION
`values`	Input series. NaNs are ignored. TYPE: `1-D float array`
`q`	Lower percentile in percent (default 0.5). Upper percentile is `100 - q`. TYPE: `float` DEFAULT: `0.5`
`pad_frac`	Fractional pad added on each side, as a fraction of the inter-percentile range. Default 0.02 (2 %). TYPE: `float` DEFAULT: `0.02`

RETURNS	DESCRIPTION
`(lower, upper) : tuple of float`	Heuristic logit bounds.

Notes

These are heuristic bounds derived from the observed sample, not physical bounds. Use them only when external (ancillary) physical bounds are unavailable.

Risks worth knowing:

Saturation amplification. If the input is already clipped against a physical limit (e.g. soil moisture against saturation), auto bounds will hug that limit and may amplify the saturation artefact rather than mitigate it. Prefer physical bounds in that case.
Sample dependence. Auto bounds depend on the data: two pixels with similar physics but different record lengths get different bounds. This is acceptable for diagnostics but unsuitable for production standardization, where bounds should be physically grounded and stable across samples.

To compare baseline, physical-bounds and auto-bounds pathways side by side, pass both bounds= and auto_bounds=True to :func:diagnose.

Helper transforms¶

`rsd.logit_transform`¶

Apply bounded logit transform: y = log((x - L) / (U - x)).

Maps the open interval (L, U) to (-∞, +∞) with position-dependent stretching - strong near the bounds, mild in the centre. This targets the compression-against-physical-bounds mechanism that causes month-dependent skewness in bounded hydroclimate variables such as soil moisture or discharge.

Values are clipped to the open interval (L + ε, U - ε) before the transform, where ε = eps_frac × (U - L). NaN values propagate as NaN.

PARAMETER	DESCRIPTION
`vals`	Input values. Must be in (L, U) after clipping. TYPE: `array - like`
`lower`	Physical lower bound (e.g., 0 for soil moisture or discharge). TYPE: `float`
`upper`	Physical upper bound (e.g., saturation moisture content × depth × density). TYPE: `float`
`eps_frac`	Fractional buffer from each bound before applying the log ratio. Default 1e-6. TYPE: `float` DEFAULT: `1e-06`

RETURNS	DESCRIPTION
`ndarray(float64)`	Logit-transformed values. Same shape as vals.

Notes

This transform is only appropriate when the data genuinely approaches physical bounds. At pixels far from both bounds the transform has little effect. For drought indices the dry tail (where droughts live) should only be stretched when there is a physical lower bound nearby; stretching without physical basis distorts relative drought severity.

See inverse_logit_transform to invert if needed.

`rsd.inverse_logit_transform`¶

Invert the bounded logit transform: x = L + (U - L) / (1 + exp(-y)).

PARAMETER	DESCRIPTION
`y`	Logit-space values (output of :func:`logit_transform`). TYPE: `array - like`
`lower`	Physical lower bound used when transforming. TYPE: `float`
`upper`	Physical upper bound used when transforming. TYPE: `float`

RETURNS	DESCRIPTION
`ndarray(float64)`	Values in the original (L, U) space.

Warning categories¶

The pipeline emits two custom UserWarning subclasses so silent failures become visible and filterable through the standard warnings machinery.

`rsd.RSDFitWarning`¶

A parametric fit (GPD, exponential, or scipy monthwise distribution) fell back or failed for a sample. The pipeline still returns a value using a documented fallback path (e.g., ECDF-only on GPD failure).

`rsd.RSDDegenerateWarning`¶

An input sample is degenerate (zero spread, constant series, or fully outside the declared physical bounds) and was handled by a fallback that may make downstream results undefined or NaN for the affected positions.

Method constant¶

rsd.METHODS is the tuple of method names accepted by the method= argument of standardize and standardize_xr:

>>> import rsd
>>> rsd.METHODS
('rsd', 'monthwise_ecdf', 'monthwise_parametric')

Stability guarantee¶

Everything documented on this page is covered by semantic versioning from v1.0.0 onward. Modules whose name starts with an underscore (_kernels, _validation, _warnings, ...) are private and may change between patch releases without notice.

API reference¶

Core entry points¶

rsd.standardize¶

rsd.standardize_xr¶

Diagnostics¶

rsd.diagnose¶

rsd.estimate_bounds¶

Helper transforms¶

rsd.logit_transform¶

rsd.inverse_logit_transform¶

Warning categories¶

rsd.RSDFitWarning¶

rsd.RSDDegenerateWarning¶

Method constant¶

Stability guarantee¶

`rsd.standardize`¶

`rsd.standardize_xr`¶

`rsd.diagnose`¶

`rsd.estimate_bounds`¶

`rsd.logit_transform`¶

`rsd.inverse_logit_transform`¶

`rsd.RSDFitWarning`¶

`rsd.RSDDegenerateWarning`¶