cellspec: Single-cell mutation spectrum analysis

`cellspec`: Single-cell mutation spectrum analysis#

A python package for analyzing variant calls from high throughput single cell genome sequencing experiments. Provides a convenient scanpy style API for loading joint calling vcf files into anndata objects, and performing downstream processing and analysis tasks, including:

coverage analysis and filtering
Annotation ancestral trinucleotide sequence context of SNPs
Computing trinucleotide mutation spectra
Visualizing mutation spectra

in development:

sequencing error / artifact correction
Mutation signature fitting and de novo signature discovery
Phylogenetic analysis
- distance based
- maximum liklihood
- Bayseian
eQTL analysis (using genome-transcriptome coassay data)

Installation#

Install the latest development version:

git clone https://github.com/harrispopgen/cellspec.git
cd cellspec
pip install -e .

Getting started#

As an homage to the semi permeable capsule technology that spurred the need for this package, I encourage the following convention when importing cellspec:

import cellspec as spc

cellspec uses the AnnData class to store joint calling data.

https://raw.githubusercontent.com/scverse/anndata/main/docs/_static/img/anndata_schema.svg

At the most basic level, an AnnData object adata stores a data matrix adata.X, annotation of observations adata.obs and variables adata.var as pd.DataFrame and unstructured annotation adata.uns as dict. Names of observations and variables can be accessed via adata.obs_names and adata.var_names, respectively. AnnData objects can be sliced like dataframes, for example, adata_subset = adata[:, list_of_gene_names].

In cellspec, observations are cells (or samples), and variables are bi-allelic sites. Genotype calls from the vcf file are stored in adata.X, and depth information in adata.layers. Total read depth at each site in each cell is stored in adata.layers["DP"], and alternate allele read depth is stored in adata.layers["AD"].

To load a vcf into anndata:

adata = spc.pp.load_vcf(filename)

This initial step can take a somewhat long time, especially for datasets with a lot of alleles. As such, it a good idea to save your data in .h5ad format for more convenient loading in the future:

adata.write_h5ad(filename)

Please refer to the documentation and tutorials for more instruction, and the API documentation for information on specific functionality.

cellspec: Single-cell mutation spectrum analysis

Contents

`cellspec`: Single-cell mutation spectrum analysis#

Installation#

Getting started#

Contents#

cellspec: Single-cell mutation spectrum analysis

Contents

cellspec: Single-cell mutation spectrum analysis#

Installation#

Getting started#

Contents#

`cellspec`: Single-cell mutation spectrum analysis#