cellspec.tl.normalize_spectrum#
- cellspec.tl.normalize_spectrum(adata, key, method='proportion', obs_column=None, normalization_vector=None, output_key=None, inplace=False)#
Normalize a mutation spectrum by various strategies.
This function normalizes spectra to enable fair comparison across samples with different sequencing depths, mutation burdens, or other covariates.
- Parameters:
adata (ad.AnnData) – AnnData object with computed spectrum
key (str) – Key for spectrum in adata.obsm (e.g., ‘somatic’ for ‘spectrum_somatic’)
method (str, default 'proportion') – Normalization method: - ‘proportion’: Divide by row sum (each sample sums to 1) - ‘obs_column’: Divide by a column in adata.obs - ‘vector’: Divide by a custom normalization vector
obs_column (str, optional) – Column name in adata.obs to use for normalization when method=’obs_column’. Examples: ‘callable_sites’, ‘total_reads’, ‘coverage’
normalization_vector (np.ndarray or pd.Series, optional) – Custom vector to divide by when method=’vector’. Must have length equal to number of cells/samples.
output_key (str, optional) – Key for storing normalized spectrum. If None, uses f’{key}_normalized’ The normalized spectrum is stored in adata.obsm[f’spectrum_{output_key}’]
inplace (bool, default False) – If True, replaces the original spectrum with normalized values. If False, stores normalized spectrum under output_key.
- Return type:
DataFrame- Returns:
pd.DataFrame Normalized spectrum DataFrame (cells × 96 contexts)
Examples
>>> import cellspec as spc >>> >>> # Normalize to proportions (each sample sums to 1) >>> spc.tl.normalize_spectrum(adata, key="somatic", method="proportion", output_key="somatic_prop") >>> >>> # Normalize by callable sites (mutations per callable site) >>> spc.tl.compute_callable_sites(adata, min_depth=10, max_depth=200) >>> spc.tl.normalize_spectrum( ... adata, key="somatic", method="obs_column", obs_column="callable_sites", output_key="somatic_rate" ... ) >>> >>> # Normalize by total read count >>> spc.tl.normalize_spectrum( ... adata, key="somatic", method="obs_column", obs_column="total_reads", output_key="somatic_per_read" ... ) >>> >>> # Normalize by custom vector >>> coverage_vector = adata.obs["mean_coverage"].values >>> spc.tl.normalize_spectrum( ... adata, ... key="somatic", ... method="vector", ... normalization_vector=coverage_vector, ... output_key="somatic_per_coverage", ... ) >>> >>> # Normalize in place (replaces original) >>> spc.tl.normalize_spectrum(adata, key="somatic", method="proportion", inplace=True)
Notes
Division by zero is handled by setting those rows to NaN
Normalization metadata is stored in adata.uns
The original spectrum is preserved unless inplace=True