cellspec.pp.filter_by_coverage#
- cellspec.pp.filter_by_coverage(adata, min_depth, max_depth=None, min_fraction=None, min_cells=None, layer='DP', min_alt_depth=None, max_alt_depth=None, inplace=True)#
Filter variants based on sequencing depth (coverage) across cells/samples.
Keeps only variants where a minimum number or fraction of cells/samples meet the depth threshold.
- Parameters:
adata (ad.AnnData) – AnnData object with variants (must have ‘DP’ and/or ‘AD’ layer depending on mode)
min_depth (int) – Minimum depth threshold per cell. Interpretation depends on
layer: - layer=’DP’: Minimum total depth (reference + alternate reads) - layer=’AD’: Minimum alternate allele depth - layer=’both’: Minimum total depth (use with min_alt_depth)max_depth (int, optional) – Maximum depth threshold per cell. Interpretation depends on
layer: - layer=’DP’: Maximum total depth (reference + alternate reads) - layer=’AD’: Maximum alternate allele depth - layer=’both’: Maximum total depth (use with max_alt_depth) If None, no upper limit is applied. Useful for filtering potential PCR duplicates, mapping artifacts, or high-coverage outliers.min_fraction (float, optional) – Minimum fraction of cells that must meet the depth threshold (0-1). Cannot be used together with min_cells. If neither min_fraction nor min_cells is specified, defaults based on layer: - layer=’DP’: defaults to 1.0 (ALL cells, for quality control) - layer=’AD’ or ‘both’: defaults to min_cells=1 (ANY cell with variant)
min_cells (int, optional) – Minimum number of cells that must meet the depth threshold. Cannot be used together with min_fraction. Useful when you want “at least N cells” regardless of total cell count.
layer (str, default 'DP') – Which layer(s) to use for filtering: - ‘DP’: Filter by total depth only (default, backward compatible) - ‘AD’: Filter by alternate allele depth only - ‘both’: Require both DP and AD thresholds
min_alt_depth (int, optional) – When layer=’both’, the minimum alternate allele depth required. If None, defaults to min_depth. Ignored when layer=’DP’ or layer=’AD’.
max_alt_depth (int, optional) – When layer=’both’, the maximum alternate allele depth allowed. If None, no upper limit is applied. Ignored when layer=’DP’ or layer=’AD’.
inplace (bool, default True) – Modify adata in place or return copy
- Return type:
- Returns:
ad.AnnData or None Filtered AnnData (if inplace=False), otherwise None
Examples
>>> import cellspec as spc >>> # Keep only variants with total depth >= 10 in ALL cells (default for DP) >>> spc.pp.filter_by_coverage(adata, min_depth=10) >>> >>> # Filter by depth range: 10 <= DP <= 100 in ALL cells >>> spc.pp.filter_by_coverage(adata, min_depth=10, max_depth=100) >>> >>> # Keep variants with DP >= 10 in at least 80% of cells >>> spc.pp.filter_by_coverage(adata, min_depth=10, min_fraction=0.8) >>> >>> # Keep variants with DP >= 10 in at least 5 cells >>> spc.pp.filter_by_coverage(adata, min_depth=10, min_cells=5) >>> >>> # Filter by alternate allele depth - keep if ANY cell has AD >= 3 (default for AD) >>> spc.pp.filter_by_coverage(adata, min_depth=3, layer="AD") >>> >>> # Filter by AD range: 3 <= AD <= 50 in at least 1 cell >>> spc.pp.filter_by_coverage(adata, min_depth=3, max_depth=50, layer="AD") >>> >>> # Filter by AD - require at least 2 cells with AD >= 3 >>> spc.pp.filter_by_coverage(adata, min_depth=3, min_cells=2, layer="AD") >>> >>> # Require both DP >= 10 AND AD >= 3 in at least 1 cell (default for 'both') >>> spc.pp.filter_by_coverage(adata, min_depth=10, min_alt_depth=3, layer="both") >>> >>> # Complex filter: 10 <= DP <= 100 AND 3 <= AD <= 50 in at least 90% of cells >>> spc.pp.filter_by_coverage( ... adata, min_depth=10, max_depth=100, min_alt_depth=3, max_alt_depth=50, layer="both", min_fraction=0.9 ... )
Notes
This function is useful for ensuring high-quality variant calls by removing sites with insufficient or excessive coverage.
Default behavior by layer:
layer=’DP’: Defaults to min_fraction=1.0 (ALL cells). This is stringent and ensures every cell has adequate total depth at retained variants, which is important for quality control.
layer=’AD’ or ‘both’: Defaults to min_cells=1 (ANY cell). This is appropriate for variant detection, as you typically want to retain variants that appear in at least one cell with sufficient alternate allele support.
Using max_depth:
Setting a maximum depth threshold helps filter out problematic sites with abnormally high coverage, which may indicate: - PCR duplicates - Mapping artifacts (e.g., reads mapping to repetitive regions) - Copy number variations or collapsed repeats - Sequencing errors or technical artifacts
For example, if median coverage is ~30x, sites with >100x coverage may be problematic. Use max_depth to exclude these outliers.
Filtering modes:
DP mode: Filters based on total sequencing depth. Useful for ensuring adequate coverage regardless of genotype.
AD mode: Filters based on alternate allele depth only. Useful when you want to retain variants with sufficient evidence for the variant allele in at least some cells. Default behavior (min_cells=1) keeps variants present in any cell.
both mode: Applies both filters simultaneously. Useful when you need both adequate total coverage AND sufficient alternate allele support. For example, requiring DP >= 10 and AD >= 3 ensures good coverage overall while also requiring at least 3 reads supporting the variant.
Choosing min_cells vs min_fraction:
Use min_cells when you want “at least N cells” regardless of dataset size (e.g., “at least 3 cells” works the same for 10-cell or 1000-cell datasets)
Use min_fraction when you want a proportion (e.g., “at least 10% of cells” scales with dataset size)
Requires the ‘DP’ layer (and ‘AD’ layer if using layer=’AD’ or layer=’both’) in adata.layers, which are automatically created by spc.pp.load_vcf() when loading VCF files.