cellspec.tl.normalize_spectrum

cellspec.tl.normalize_spectrum#

cellspec.tl.normalize_spectrum(adata, key, method='proportion', obs_column=None, normalization_vector=None, output_key=None, inplace=False)#

Normalize a mutation spectrum by various strategies.

This function normalizes spectra to enable fair comparison across samples with different sequencing depths, mutation burdens, or other covariates.

Parameters:
  • adata (ad.AnnData) – AnnData object with computed spectrum

  • key (str) – Key for spectrum in adata.obsm (e.g., ‘somatic’ for ‘spectrum_somatic’)

  • method (str, default 'proportion') – Normalization method: - ‘proportion’: Divide by row sum (each sample sums to 1) - ‘obs_column’: Divide by a column in adata.obs - ‘vector’: Divide by a custom normalization vector

  • obs_column (str, optional) – Column name in adata.obs to use for normalization when method=’obs_column’. Examples: ‘callable_sites’, ‘total_reads’, ‘coverage’

  • normalization_vector (np.ndarray or pd.Series, optional) – Custom vector to divide by when method=’vector’. Must have length equal to number of cells/samples.

  • output_key (str, optional) – Key for storing normalized spectrum. If None, uses f’{key}_normalized’ The normalized spectrum is stored in adata.obsm[f’spectrum_{output_key}’]

  • inplace (bool, default False) – If True, replaces the original spectrum with normalized values. If False, stores normalized spectrum under output_key.

Return type:

DataFrame

Returns:

pd.DataFrame Normalized spectrum DataFrame (cells × 96 contexts)

Examples

>>> import cellspec as spc
>>>
>>> # Normalize to proportions (each sample sums to 1)
>>> spc.tl.normalize_spectrum(adata, key="somatic", method="proportion", output_key="somatic_prop")
>>>
>>> # Normalize by callable sites (mutations per callable site)
>>> spc.tl.compute_callable_sites(adata, min_depth=10, max_depth=200)
>>> spc.tl.normalize_spectrum(
...     adata, key="somatic", method="obs_column", obs_column="callable_sites", output_key="somatic_rate"
... )
>>>
>>> # Normalize by total read count
>>> spc.tl.normalize_spectrum(
...     adata, key="somatic", method="obs_column", obs_column="total_reads", output_key="somatic_per_read"
... )
>>>
>>> # Normalize by custom vector
>>> coverage_vector = adata.obs["mean_coverage"].values
>>> spc.tl.normalize_spectrum(
...     adata,
...     key="somatic",
...     method="vector",
...     normalization_vector=coverage_vector,
...     output_key="somatic_per_coverage",
... )
>>>
>>> # Normalize in place (replaces original)
>>> spc.tl.normalize_spectrum(adata, key="somatic", method="proportion", inplace=True)

Notes

  • Division by zero is handled by setting those rows to NaN

  • Normalization metadata is stored in adata.uns

  • The original spectrum is preserved unless inplace=True