cellspec.tl.private_mutations

cellspec.tl.private_mutations#

cellspec.tl.private_mutations(adata, groupby=None, count_strategy='presence', genotypes=None, min_alt_depth=None, min_depth=None, max_depth=None, store_key='private', inplace=True)#

Identify mutations that are private (unique) to each cell/sample or group.

Per-cell mode (groupby=None): A mutation is considered private to a cell if it is present in that cell and absent from all other cells.

Per-group mode (groupby specified): A mutation is considered private to a group if it is: 1. Present in ALL cells/samples within that group (shared) 2. AND absent from ALL cells/samples in all other groups (unique to group)

Parameters:
  • adata (ad.AnnData) – AnnData object with mutation data

  • groupby (str, optional) – Column name in adata.obs to group by (e.g., ‘cell_type’, ‘sample_id’). If None, treats each cell/sample individually.

  • count_strategy (str, default 'presence') – Strategy for counting mutations: - ‘presence’: Count any variant present in .X (any non-zero value) - ‘genotype’: Count only specific genotypes (requires genotypes parameter) - ‘alt_depth’: Count sites with AD >= threshold (requires min_alt_depth)

  • genotypes (list of int, optional) – Genotypes to count when count_strategy=’genotype’. E.g., [1, 3] for HET and HOM_ALT, or [3] for HOM_ALT only.

  • min_alt_depth (int, optional) – Minimum AD when count_strategy=’alt_depth’

  • min_depth (int, optional) – Minimum DP for counting per cell (filters per-cell)

  • max_depth (int, optional) – Maximum DP for counting per cell (filters per-cell)

  • store_key (str, default 'private') – Key for storing results in adata.uns

  • inplace (bool, default True) – If True, stores results in adata.uns[f’{store_key}_mutations’]. If False, only returns the DataFrame.

Return type:

DataFrame

Returns:

pd.DataFrame Boolean DataFrame with variants as rows and cells/groups as columns. True indicates the mutation is private to that cell/group. Also stores counts in adata.obs[f’{store_key}_count’] (per cell/group).

Examples

>>> import cellspec as spc
>>> # Find private mutations for each sample (any non-zero genotype)
>>> private_df = spc.tl.private_mutations(adata)
>>> # Count per sample stored in adata.obs['private_count']
>>> print(adata.obs["private_count"])
>>> # Find private mutations per cell type
>>> private_df = spc.tl.private_mutations(adata, groupby="cell_type")
>>> # Only count HOM_ALT (genotype=3) as private
>>> private_df = spc.tl.private_mutations(adata, count_strategy="genotype", genotypes=[3])
>>> # Use alternate depth with depth filtering
>>> private_df = spc.tl.private_mutations(
...     adata, count_strategy="alt_depth", min_alt_depth=3, min_depth=10, max_depth=200
... )

Notes

Per-cell mode (groupby=None): For each cell, a mutation is private if: 1. It is present (based on count_strategy) in that cell 2. It passes depth filters (if specified) 3. It is absent from ALL other cells

Per-group mode (groupby specified): For each group, a mutation is private if: 1. It is present (based on count_strategy) in ALL cells within that group 2. It passes depth filters (if specified) for all cells in the group 3. It is absent from ALL cells in all other groups

This makes private mutations “group-specific shared mutations” - mutations that are shared among all members of a group but found in no other groups.

Summing along axis=1 (across cells/groups) should give values of 0 or 1, since a mutation can only be private to one cell/group.

Results are stored in:

  • adata.uns[f’{store_key}_mutations’]: Boolean DataFrame (variants × cells/groups)

  • adata.obs[f’{store_key}_count’] or adata.uns[f’{store_key}_counts’]: Count of private mutations per cell/group

  • adata.uns[f’{store_key}_metadata’]: Parameters used