cellspec.pp.annotate_contexts#
- cellspec.pp.annotate_contexts(adata, fasta_path, chrom_prefix=None, show_progress=True, inplace=True)#
Annotate variants with trinucleotide contexts.
Adds three columns to adata.var: - ‘anc’: Ancestral (reference) trinucleotide - ‘der’: Derived (alternate) trinucleotide - ‘trinuc_type’: Standardized COSMIC 96-type label (e.g., ‘ACA>AAA’)
- Parameters:
adata (ad.AnnData) – AnnData object with variants
fasta_path (str) – Path to reference genome FASTA file
chrom_prefix (str, optional) – Chromosome prefix in variant IDs. If None (default), accepts any chromosome naming. Use ‘chr’ for human/mouse if needed.
show_progress (bool, default True) – Show progress bar
inplace (bool, default True) – Modify adata in place or return copy
- Return type:
- Returns:
ad.AnnData or None Modified AnnData (if inplace=False), otherwise None
Examples
>>> import cellspec as spc >>> # Works with any organism >>> adata = spc.pp.load_vcf("variants.vcf.gz") >>> spc.pp.annotate_contexts(adata, fasta_path="reference.fa") >>> print(adata.var["trinuc_type"].head()) >>> >>> # Human data with 'chr' prefix >>> spc.pp.annotate_contexts(adata, fasta_path="hg38.fa", chrom_prefix="chr")
Notes
Trinucleotide contexts are strand-standardized following COSMIC convention: - Always report pyrimidine (C or T) as the reference base - If reference is purine (A or G), reverse complement the trinucleotide - This ensures consistent 96-channel representation
Works with any organism’s chromosome naming: - Human/mouse: ‘chr1’, ‘chr2’, etc. - C. elegans: ‘I’, ‘II’, ‘III’, ‘IV’, ‘V’, ‘X’ - Drosophila: ‘2L’, ‘2R’, ‘3L’, ‘3R’, ‘X’