cellspec.pp.filter_to_snps

Contents

cellspec.pp.filter_to_snps#

cellspec.pp.filter_to_snps(adata, chrom_prefix=None, inplace=True)#

Filter to only single nucleotide variants (SNPs).

Removes indels and multi-allelic sites, keeping only sites where both reference and alternate alleles are single nucleotides.

Parameters:
  • adata (ad.AnnData) – AnnData object with variants

  • chrom_prefix (str, optional) – Chromosome prefix for parsing variant IDs. If None (default), accepts any chromosome naming (e.g., ‘chr1’, ‘I’, ‘1’). Use ‘chr’ for human/mouse data if you want to be strict.

  • inplace (bool, default True) – Modify adata in place or return copy

Return type:

AnnData | None

Returns:

ad.AnnData or None Filtered AnnData (if inplace=False), otherwise None

Examples

>>> import cellspec as spc
>>> # Default: works with any chromosome naming
>>> adata = spc.pp.load_vcf("variants.vcf.gz")
>>> spc.pp.filter_to_snps(adata)
>>> print(f"Retained {adata.n_vars} SNPs")
>>>
>>> # For human/mouse data, can specify prefix
>>> spc.pp.filter_to_snps(adata, chrom_prefix="chr")

Notes

This is typically run before trinucleotide context annotation, as trinuc contexts are only defined for SNPs.

Works with any organism’s chromosome naming: - Human/mouse: ‘chr1-12345-A>T’ - C. elegans: ‘I-12345-A>T’ - Drosophila: ‘2L-12345-A>T’