02 - SSI on an xarray grid with bounded soil moisture¶
Demonstrates the N-D entry point rsd.standardize_xr and the
bounded-variable pathway (bounds=) used for soil moisture and other
physically bounded quantities.
We build a small synthetic 5x5 grid of monthly soil moisture in
[0, sat], run standardize_xr(..., bounds=(0, sat)) so the logit
pre-transform handles the physical bound, and confirm the result is a
DataArray with the same dimensions as the input.
No external data; runs in a couple of seconds.
import numpy as np
import pandas as pd
import xarray as xr
import rsd
print(f"rsd version: {rsd.__version__}")
rsd version: 0.1.dev0+d20260520
1. Build a synthetic bounded soil-moisture grid¶
5 latitude x 5 longitude pixels, 50 years of monthly values. Soil moisture
is drawn from a beta distribution (which lives on (0, 1)) and scaled to
a saturation of sat = 0.4 m^3/m^3. Seasonal variation in the beta shape
parameter creates a realistic wet/dry cycle.
rng = np.random.default_rng(0)
n_years = 50
n_lat, n_lon = 5, 5
sat = 0.4 # m^3/m^3
times = pd.date_range("1970-01-01", periods=n_years * 12, freq="MS")
months = np.array([t.month for t in times])
data = np.empty((len(times), n_lat, n_lon))
for i in range(n_lat):
for j in range(n_lon):
data[:, i, j] = sat * np.array(
[rng.beta(2 + 0.2 * m, 5 - 0.2 * m) for m in months]
)
sm = xr.DataArray(
data,
dims=["time", "lat", "lon"],
coords={
"time": times,
"lat": np.linspace(-2, 2, n_lat),
"lon": np.linspace(-2, 2, n_lon),
},
name="soil_moisture",
attrs={"units": "m3 m-3", "long_name": "synthetic soil moisture"},
)
# Compact header summary instead of dumping the full array.
print(f"xarray.DataArray {sm.name!r} dims={sm.dims} shape={sm.shape} dtype={sm.dtype}")
print(f" coords: time({len(sm.time)}) lat({len(sm.lat)}) lon({len(sm.lon)})")
print(f" attrs: {sm.attrs}")
xarray.DataArray 'soil_moisture' dims=('time', 'lat', 'lon') shape=(600, 5, 5) dtype=float64
coords: time(600) lat(5) lon(5)
attrs: {'units': 'm3 m-3', 'long_name': 'synthetic soil moisture'}
2. Compute SSI-3 with an explicit reference¶
We split the 50-year series into a 20-year reference baseline (1970-1989) and use the full series as the target. This is the canonical reference-vs-target pattern the package is designed for (example 01 demonstrates the same pattern in 1-D): the reference defines the CDF; target values are scored against it on a fixed, shared scale.
Two parameters distinguish this from the SPI example:
bounds=(0.0, sat)activates the logit pre-transform. Each per-timestep value is mapped from the bounded interval(0, sat)into unbounded logit space before accumulation, so the rest of the pipeline (accumulation, deseasonalisation, tail fitting) sees real-valued data with no boundary effects.agg="mean"keeps soil moisture in physical units (a 3-month mean is still a soil-moisture value). Sum is more natural for fluxes such as precipitation.
Months are extracted automatically from the time coordinate; no need to
pass months_target/months_reference here.
# Use the first 20 years (1970-1989) as the reference; the full
# 50-year series is the target. The reference defines the fixed CDF;
# target values are evaluated against it.
sm_ref = sm.isel(time=slice(0, 20 * 12))
ssi = rsd.standardize_xr(
target=sm,
reference=sm_ref,
method="rsd",
scale=3,
agg="mean",
bounds=(0.0, sat),
parallel=True, # use dask for per-chunk parallelism
).compute()
# Compact header summary instead of dumping the full array.
print(f"xarray.DataArray dims={ssi.dims} shape={ssi.shape} dtype={ssi.dtype}")
print(f" attrs: {ssi.attrs}")
xarray.DataArray dims=('time', 'lat', 'lon') shape=(600, 5, 5) dtype=float32
attrs: {'method': 'rsd', 'scale': 3, 'agg': 'mean', 'reference_period': '1970-01-01 to 1989-12-01'}
3. Sanity-check¶
Output should be a float32 DataArray with the same dimensions and
coordinates as the target input. The first scale - 1 = 2 time
positions are NaN by construction (the rolling window is not yet
full). Inside the reference window the z-scores should be near
standard-normal; outside the reference window they are still
approximately N(0,1) here because the synthesis is stationary
(reference and target are drawn from the same distribution).
print(f"dtype: {ssi.dtype}")
print(f"shape: {ssi.shape} (time, lat, lon)")
print(f"first 2 time positions all NaN: {bool(ssi.isel(time=slice(0, 2)).isnull().all())}")
post_warmup = ssi.isel(time=slice(2, None))
print(f"post-warmup mean (~0): {float(post_warmup.mean()):+.3f}")
print(f"post-warmup std (~1): {float(post_warmup.std()):+.3f}")
dtype: float32 shape: (600, 5, 5) (time, lat, lon) first 2 time positions all NaN: True post-warmup mean (~0): -0.022 post-warmup std (~1): +1.060
A note on agg="sum" with bounds¶
The logit pre-transform is applied before accumulation, so both
agg="sum" and agg="mean" work cleanly with bounded inputs. The
scientific choice between the two depends on what you want to standardise:
an average soil-moisture state (mean) or an accumulated water-content
quantity (sum). Both produce valid SSI fields under the bounded path.
What next¶
- See
03_diagnostics_showcase.ipynbforrsd.diagnose, which checks the exchangeability assumption that the RSD pooling step relies on. - Full API reference at https://hydro-rsd.readthedocs.io.