API reference¶
Every entry on this page is generated directly from the docstrings in
src/rsd/. Edits to the source docstrings update this page on the next
docs build.
The package exposes a flat top-level surface (rsd.standardize,
rsd.standardize_xr, rsd.diagnose, rsd.estimate_bounds,
rsd.logit_transform, rsd.inverse_logit_transform, rsd.METHODS,
rsd.RSDFitWarning, rsd.RSDDegenerateWarning). The signatures below
are documented at their source modules; importing them from the
top-level package gives identical behaviour.
Core entry points¶
rsd.standardize¶
Compute a standardized hydroclimate index.
This is the main entry point for the RSD package. It supports three methods that share a common interface, making it easy to compare approaches on the same data.
The four data arguments (target, reference, months_target,
months_reference) may be passed positionally; all configuration
arguments after them are keyword-only to prevent silent positional
mix-ups like standardize(tgt, ref, m_tgt, m_ref, "rsd", 3)
which would otherwise assign month arrays to method and
scale.
| PARAMETER | DESCRIPTION |
|---|---|
target
|
The time series to standardize.
TYPE:
|
reference
|
The reference series used to build the CDF. If
TYPE:
|
months_target
|
Calendar month (1-12) for each element of target. Required.
TYPE:
|
months_reference
|
Calendar month (1-12) for each element of reference.
Required when reference is provided separately.
If
TYPE:
|
method
|
Standardization method. Keyword-only.
TYPE:
|
scale
|
Temporal accumulation window in time steps (default 3). Keyword-only.
TYPE:
|
agg
|
Aggregation function within the window (default "sum"). Keyword-only.
TYPE:
|
tail_quantile
|
(RSD only) Fraction of distribution in each tail (default 0.10).
TYPE:
|
min_tail_size
|
(RSD only) Minimum exceedances for GPD fitting (default 20).
TYPE:
|
loc
|
(RSD only) Location estimator for deseasonalization.
TYPE:
|
scale_method
|
(RSD only) Scale estimator for deseasonalization.
TYPE:
|
distribution
|
(Parametric only) Name of a scipy.stats continuous distribution.
TYPE:
|
prob_zero
|
(Parametric only) Handle zero-inflation for distributions with support at or above 0.
TYPE:
|
floc
|
(Parametric only) Fixed location parameter for distributions
with support at or above 0 (gamma, lognorm, expon, chi2,
weibull_min). Default
TYPE:
|
plotting_position
|
Plotting position formula name or (a, b) constants.
TYPE:
|
bounds
|
(RSD only) Physical per-timestep bounds for a logit pre-transform.
When provided,
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray(float32)
|
Standardized z-scores. Same length as target. Internal computations run in float64 throughout the pipeline; the output is cast to float32 for memory efficiency at the storage boundary. |
Notes
Self-standardization (reference=None) is for testing and
single-series workflows only. The motivating problem RSD was
built to solve is cross-series comparison under a fixed reference
distribution (factual vs counterfactual, historical vs future,
model A vs model B). Using target as its own reference reduces
the method to a self-standardized index and removes the
cross-series differences the method is designed to preserve.
Production calls should always provide an explicit reference.
Warnings. When the RSD method is selected and a tail fit
fails or falls back, an :class:rsd.RSDFitWarning is emitted;
when a calendar month in the reference is degenerate (zero
spread), an :class:rsd.RSDDegenerateWarning is emitted and
the affected positions become NaN downstream.
rsd.standardize_xr¶
Compute a standardized hydroclimate index over an N-D DataArray.
This wraps the 1-D kernels from :mod:rsd._kernels using
xr.apply_ufunc with vectorize=True, applying the chosen
method independently to each spatial pixel along the time dimension.
The two data arguments (target, reference) may be passed positionally; all configuration arguments after them are keyword-only.
| PARAMETER | DESCRIPTION |
|---|---|
target
|
The DataArray to standardize. Must have a time dimension.
TYPE:
|
reference
|
Reference DataArray for building the CDF. If
TYPE:
|
method
|
Standardization method. Keyword-only.
TYPE:
|
scale
|
Temporal accumulation window in time steps (default 3). Keyword-only.
TYPE:
|
agg
|
Aggregation function within the window (default "sum"). Keyword-only.
TYPE:
|
time_dim
|
Name of the time dimension (default "time"). Keyword-only.
TYPE:
|
tail_quantile
|
(RSD only) Fraction of distribution in each tail.
TYPE:
|
min_tail_size
|
(RSD only) Minimum exceedances for GPD fitting.
TYPE:
|
loc
|
(RSD only) Location estimator for deseasonalization.
TYPE:
|
scale_method
|
(RSD only) Scale estimator for deseasonalization.
TYPE:
|
distribution
|
(Parametric only) Name of a scipy.stats continuous distribution.
TYPE:
|
prob_zero
|
(Parametric only) Handle zero-inflation.
TYPE:
|
floc
|
(Parametric only) Fixed location parameter for distributions
with support at or above 0. Default
TYPE:
|
plotting_position
|
Plotting position formula name or (a, b) constants.
TYPE:
|
parallel
|
If True, use dask parallelized scheduling. If False (default), run sequentially.
TYPE:
|
chunk_size
|
Spatial chunk size for auto-chunking when parallel=True and input is not already chunked (default 50).
TYPE:
|
bounds
|
(RSD only) Physical per-timestep bounds for a logit pre-transform.
When provided,
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataArray(float32)
|
Standardized z-scores with the same dimensions as target. Internal computations run in float64 throughout the pipeline; the output is cast to float32 for memory efficiency at the storage boundary. |
Notes
Self-standardization (reference=None) is for testing and
single-series workflows only. RSD's motivating use case is
cross-series comparison under a fixed reference distribution
(factual vs counterfactual, historical vs future, model A vs
model B). Using target as its own reference reduces the method
to a self-standardized index and removes the cross-series
differences the method is designed to preserve. Production calls
should always provide an explicit reference DataArray.
Warnings. As in :func:rsd.standardize, tail-fit failures
emit :class:rsd.RSDFitWarning and degenerate-month references
emit :class:rsd.RSDDegenerateWarning. On large grids these can
flood the warning stream; suppress with
warnings.filterwarnings("ignore", category=rsd.RSDFitWarning)
if you trust the fallbacks.
Performance
The wrapper uses xr.apply_ufunc(..., vectorize=True), which
runs the 1-D kernel in a Python-level loop over pixels per chunk.
Throughput is therefore bound by the Python overhead per pixel
multiplied by the number of pixels per chunk; expect minutes
(not seconds) on a global 0.5° grid. For large grids:
- Use
parallel=Trueso dask runs the per-chunk loops in parallel across CPU cores. - Pre-chunk the input spatially (e.g.,
target.chunk({"lat": 60, "lon": 60, "time": -1})) so the time dimension stays contiguous within each chunk; the wrapper enforces this with :func:_rechunk_time_fullbut starting from sensible spatial chunks avoids extra rechunking. - For an end-to-end SSI-3 on a ~50k-land-pixel grid over a
20-year window, expect on the order of 10-15 minutes on a
modern laptop with
parallel=True.
Diagnostics¶
rsd.diagnose¶
One-call exchangeability diagnostic for an RSD reference series.
By default :func:diagnose prints an overview block (configuration
+ per-month seasonal parameters), renders the summary figure
(location and scale on top, before / after deseasonalization KDEs
below), and prints the Anderson-Darling omnibus + per-month
Kolmogorov-Smirnov leave-one-out report.
Passing bounds=(L, U) adds a physical-bounds pathway that
applies a logit pre-transform before accumulation; passing
auto_bounds=True adds an auto-bounds pathway with bounds
derived from :func:estimate_bounds. Either case produces a second
figure (bounds_fig) that places baseline next to the bounded
variants, and the report becomes multi-method.
The summary figure always shows the baseline (unbounded) location
and scale and the baseline before / after KDEs. Logit-space
location and scale are intentionally omitted from that figure: they
do not have a direct physical interpretation. The bounded
comparison lives in the separate bounds_fig.
| PARAMETER | DESCRIPTION |
|---|---|
values
|
Reference time series (raw monthly values).
TYPE:
|
months
|
Calendar month (1 to 12) for each element of
TYPE:
|
name
|
Label for the series being diagnosed. Printed inside the overview header and used as the second line of figure suptitles. Required.
TYPE:
|
index
|
Optional drought-index identifier (e.g.
TYPE:
|
scale
|
Temporal accumulation window. Set to 1 if
TYPE:
|
agg
|
Accumulation aggregator.
TYPE:
|
loc
|
Location estimator for deseasonalization.
TYPE:
|
scale_method
|
Scale estimator for deseasonalization.
TYPE:
|
bounds
|
Physical logit bounds. When supplied, the diagnostic adds a
TYPE:
|
auto_bounds
|
If True, also add an
TYPE:
|
tail_quantile
|
Exceedance threshold printed in the overview header.
TYPE:
|
alpha
|
Significance level for the per-month KS rejection flag.
TYPE:
|
quiet
|
Suppress all output: no overview or report is printed, figures
are closed before returning, and the corresponding return-dict
slots are set to
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Keys:
|
Notes
When bounds=(L, U) is supplied with L >= min(values) or
U <= max(values), :func:diagnose emits a UserWarning to
flag the saturation or clipping risk;
:func:rsd.transform.logit_transform may emit a separate warning
of its own when it actually clips values to the open interval.
rsd.estimate_bounds¶
Estimate heuristic logit bounds from a sample's percentile range.
Returns (L, U) with L < min(values) and U > max(values),
computed as the q-th and (100-q)-th percentiles padded by
pad_frac times the inter-percentile range on each side. The
returned bounds satisfy the strict inequalities required by
:func:rsd.transform.logit_transform.
| PARAMETER | DESCRIPTION |
|---|---|
values
|
Input series. NaNs are ignored.
TYPE:
|
q
|
Lower percentile in percent (default 0.5). Upper percentile is
TYPE:
|
pad_frac
|
Fractional pad added on each side, as a fraction of the inter-percentile range. Default 0.02 (2 %).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
(lower, upper) : tuple of float
|
Heuristic logit bounds. |
Notes
These are heuristic bounds derived from the observed sample, not physical bounds. Use them only when external (ancillary) physical bounds are unavailable.
Risks worth knowing:
- Saturation amplification. If the input is already clipped against a physical limit (e.g. soil moisture against saturation), auto bounds will hug that limit and may amplify the saturation artefact rather than mitigate it. Prefer physical bounds in that case.
- Sample dependence. Auto bounds depend on the data: two pixels with similar physics but different record lengths get different bounds. This is acceptable for diagnostics but unsuitable for production standardization, where bounds should be physically grounded and stable across samples.
To compare baseline, physical-bounds and auto-bounds pathways side
by side, pass both bounds= and auto_bounds=True to
:func:diagnose.
Helper transforms¶
rsd.logit_transform¶
Apply bounded logit transform: y = log((x - L) / (U - x)).
Maps the open interval (L, U) to (-∞, +∞) with position-dependent stretching - strong near the bounds, mild in the centre. This targets the compression-against-physical-bounds mechanism that causes month-dependent skewness in bounded hydroclimate variables such as soil moisture or discharge.
Values are clipped to the open interval (L + ε, U - ε) before the transform, where ε = eps_frac × (U - L). NaN values propagate as NaN.
| PARAMETER | DESCRIPTION |
|---|---|
vals
|
Input values. Must be in (L, U) after clipping.
TYPE:
|
lower
|
Physical lower bound (e.g., 0 for soil moisture or discharge).
TYPE:
|
upper
|
Physical upper bound (e.g., saturation moisture content × depth × density).
TYPE:
|
eps_frac
|
Fractional buffer from each bound before applying the log ratio. Default 1e-6.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray(float64)
|
Logit-transformed values. Same shape as vals. |
Notes
This transform is only appropriate when the data genuinely approaches physical bounds. At pixels far from both bounds the transform has little effect. For drought indices the dry tail (where droughts live) should only be stretched when there is a physical lower bound nearby; stretching without physical basis distorts relative drought severity.
See inverse_logit_transform to invert if needed.
rsd.inverse_logit_transform¶
Invert the bounded logit transform: x = L + (U - L) / (1 + exp(-y)).
| PARAMETER | DESCRIPTION |
|---|---|
y
|
Logit-space values (output of :func:
TYPE:
|
lower
|
Physical lower bound used when transforming.
TYPE:
|
upper
|
Physical upper bound used when transforming.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray(float64)
|
Values in the original (L, U) space. |
Warning categories¶
The pipeline emits two custom UserWarning subclasses so silent
failures become visible and filterable through the standard warnings
machinery.
rsd.RSDFitWarning¶
A parametric fit (GPD, exponential, or scipy monthwise distribution) fell back or failed for a sample. The pipeline still returns a value using a documented fallback path (e.g., ECDF-only on GPD failure).
rsd.RSDDegenerateWarning¶
An input sample is degenerate (zero spread, constant series, or fully outside the declared physical bounds) and was handled by a fallback that may make downstream results undefined or NaN for the affected positions.
Method constant¶
rsd.METHODS is the tuple of method names accepted by the method=
argument of standardize and standardize_xr:
Stability guarantee¶
Everything documented on this page is covered by semantic versioning
from v1.0.0 onward. Modules whose name starts with an underscore
(_kernels, _validation, _warnings, ...) are private and may
change between patch releases without notice.