cellspec.tl.compute_spectrum

cellspec.tl.compute_spectrum#

cellspec.tl.compute_spectrum(adata, count_strategy='presence', genotypes=None, min_alt_depth=None, min_depth=None, max_depth=None, variant_mask=None, private_key=None, key='spectrum', show_progress=True, description=None)#

Compute 96-channel mutation spectrum for each cell/sample or per private mutation group.

By default, computes spectrum per cell (stored in adata.obsm[f’spectrum_{key}’]). If private_key is provided, computes spectrum per private mutation group (stored in adata.uns[f’spectrum_{key}’]).

Parameters:
  • adata (ad.AnnData) – AnnData object with annotated variants

  • count_strategy (str, default 'presence') – Strategy for counting mutations: - ‘presence’: Count any variant present in .X (binary) - ‘genotype’: Count based on genotype (requires genotype layer) - ‘alt_depth’: Count sites with AD >= threshold

  • genotypes (list of int, optional) – Genotypes to count when count_strategy=’genotype’ Not applicable for binary presence data (uses .X which is already filtered)

  • min_alt_depth (int, optional) – Minimum AD when count_strategy=’alt_depth’

  • min_depth (int, optional) – Minimum DP for counting per cell (filters during counting)

  • max_depth (int, optional) – Maximum DP for counting per cell (filters during counting)

  • variant_mask (pd.DataFrame or np.ndarray, optional) –

    Boolean mask specifying which variants to count for each cell.

    • If DataFrame: rows are variants (matching adata.var.index), columns are cells (matching adata.obs_names)

    • If np.ndarray: shape (n_cells, n_variants) or (n_variants,)

    • If (n_variants,): same mask applied to all cells

    • True = count this variant for this cell, False = exclude

    Note: Cannot be used together with private_key.

  • private_key (str, optional) –

    Key for private mutations DataFrame in adata.uns (e.g., ‘private’). If provided, computes spectrum per private mutation group instead of per cell. Looks up adata.uns[f’{private_key}_mutations’] and adata.uns[f’{private_key}_metadata’]. For each group, counts trinucleotide contexts for variants private to that group within cells belonging to that group. Returns (96 contexts × groups) DataFrame stored in adata.uns[f’spectrum_{key}’].

    Note: Cannot be used together with variant_mask.

  • key (str, default 'spectrum') – Key for storing spectrum (in adata.obsm if per-cell, adata.uns if per-group)

  • show_progress (bool, default True) – Show progress bar

  • description (str, optional) – Description for metadata

Return type:

DataFrame

Returns:

pd.DataFrame Spectrum DataFrame. Shape depends on mode: - Per-cell mode: (cells × 96 contexts), stored in adata.obsm[f’spectrum_{key}’] - Per-group mode: (96 contexts × groups), stored in adata.uns[f’spectrum_{key}’]

Examples

>>> import cellspec as spc
>>> adata = spc.pp.load_vcf("variants.vcf.gz")
>>> spc.pp.annotate_contexts(adata, "reference.fa")
>>> spc.pp.filter_to_snps(adata)
>>>
>>> # Count all variants present (default) - per cell
>>> spectrum = spc.tl.compute_spectrum(adata, key="all")
>>>
>>> # Count only sites with DP >= 10 - per cell
>>> spectrum = spc.tl.compute_spectrum(adata, min_depth=10, max_depth=200, key="dp10")
>>>
>>> # Compute spectrum per private mutation group
>>> spc.tl.private_mutations(adata, groupby="lineage", genotypes=[3], store_key="private")
>>> spectrum = spc.tl.compute_spectrum(
...     adata, count_strategy="genotype", genotypes=[3], private_key="private", key="private"
... )
>>> # Returns (96 contexts × lineages) DataFrame in adata.uns['spectrum_private']

Notes

Per-cell mode (default): The function filters counting per-cell based on depth thresholds. A variant is counted for cell_i if: 1. It passes the count_strategy filter 2. AND cell_i has adequate depth at that site (if min/max_depth specified) 3. AND the variant_mask is True for this cell (if variant_mask provided)

Results stored in: - adata.obsm[f’spectrum_{key}’]: DataFrame (cells × 96 contexts) - adata.uns[f’spectrum_{key}_metadata’]: Dict with computation details

Per-group mode (private_key provided): For each private mutation group: 1. Identifies variants marked as private to that group 2. Identifies cells belonging to that group 3. Counts trinucleotide contexts for private variants within those cells only 4. Applies count_strategy and depth filters per-cell as usual

Results stored in: - adata.uns[f’spectrum_{key}’]: DataFrame (96 contexts × groups) - adata.uns[f’spectrum_{key}_metadata’]: Dict with computation details