cellspec.pp.annotate_contexts

cellspec.pp.annotate_contexts#

cellspec.pp.annotate_contexts(adata, fasta_path, chrom_prefix=None, show_progress=True, inplace=True)#

Annotate variants with trinucleotide contexts.

Adds three columns to adata.var: - ‘anc’: Ancestral (reference) trinucleotide - ‘der’: Derived (alternate) trinucleotide - ‘trinuc_type’: Standardized COSMIC 96-type label (e.g., ‘ACA>AAA’)

Parameters:
  • adata (ad.AnnData) – AnnData object with variants

  • fasta_path (str) – Path to reference genome FASTA file

  • chrom_prefix (str, optional) – Chromosome prefix in variant IDs. If None (default), accepts any chromosome naming. Use ‘chr’ for human/mouse if needed.

  • show_progress (bool, default True) – Show progress bar

  • inplace (bool, default True) – Modify adata in place or return copy

Return type:

AnnData | None

Returns:

ad.AnnData or None Modified AnnData (if inplace=False), otherwise None

Examples

>>> import cellspec as spc
>>> # Works with any organism
>>> adata = spc.pp.load_vcf("variants.vcf.gz")
>>> spc.pp.annotate_contexts(adata, fasta_path="reference.fa")
>>> print(adata.var["trinuc_type"].head())
>>>
>>> # Human data with 'chr' prefix
>>> spc.pp.annotate_contexts(adata, fasta_path="hg38.fa", chrom_prefix="chr")

Notes

Trinucleotide contexts are strand-standardized following COSMIC convention: - Always report pyrimidine (C or T) as the reference base - If reference is purine (A or G), reverse complement the trinucleotide - This ensures consistent 96-channel representation

Works with any organism’s chromosome naming: - Human/mouse: ‘chr1’, ‘chr2’, etc. - C. elegans: ‘I’, ‘II’, ‘III’, ‘IV’, ‘V’, ‘X’ - Drosophila: ‘2L’, ‘2R’, ‘3L’, ‘3R’, ‘X’