cellspec.pp.filter_by_coverage

cellspec.pp.filter_by_coverage#

cellspec.pp.filter_by_coverage(adata, min_depth, max_depth=None, min_fraction=None, min_cells=None, layer='DP', min_alt_depth=None, max_alt_depth=None, inplace=True)#

Filter variants based on sequencing depth (coverage) across cells/samples.

Keeps only variants where a minimum number or fraction of cells/samples meet the depth threshold.

Parameters:
  • adata (ad.AnnData) – AnnData object with variants (must have ‘DP’ and/or ‘AD’ layer depending on mode)

  • min_depth (int) – Minimum depth threshold per cell. Interpretation depends on layer: - layer=’DP’: Minimum total depth (reference + alternate reads) - layer=’AD’: Minimum alternate allele depth - layer=’both’: Minimum total depth (use with min_alt_depth)

  • max_depth (int, optional) – Maximum depth threshold per cell. Interpretation depends on layer: - layer=’DP’: Maximum total depth (reference + alternate reads) - layer=’AD’: Maximum alternate allele depth - layer=’both’: Maximum total depth (use with max_alt_depth) If None, no upper limit is applied. Useful for filtering potential PCR duplicates, mapping artifacts, or high-coverage outliers.

  • min_fraction (float, optional) – Minimum fraction of cells that must meet the depth threshold (0-1). Cannot be used together with min_cells. If neither min_fraction nor min_cells is specified, defaults based on layer: - layer=’DP’: defaults to 1.0 (ALL cells, for quality control) - layer=’AD’ or ‘both’: defaults to min_cells=1 (ANY cell with variant)

  • min_cells (int, optional) – Minimum number of cells that must meet the depth threshold. Cannot be used together with min_fraction. Useful when you want “at least N cells” regardless of total cell count.

  • layer (str, default 'DP') – Which layer(s) to use for filtering: - ‘DP’: Filter by total depth only (default, backward compatible) - ‘AD’: Filter by alternate allele depth only - ‘both’: Require both DP and AD thresholds

  • min_alt_depth (int, optional) – When layer=’both’, the minimum alternate allele depth required. If None, defaults to min_depth. Ignored when layer=’DP’ or layer=’AD’.

  • max_alt_depth (int, optional) – When layer=’both’, the maximum alternate allele depth allowed. If None, no upper limit is applied. Ignored when layer=’DP’ or layer=’AD’.

  • inplace (bool, default True) – Modify adata in place or return copy

Return type:

AnnData | None

Returns:

ad.AnnData or None Filtered AnnData (if inplace=False), otherwise None

Examples

>>> import cellspec as spc
>>> # Keep only variants with total depth >= 10 in ALL cells (default for DP)
>>> spc.pp.filter_by_coverage(adata, min_depth=10)
>>>
>>> # Filter by depth range: 10 <= DP <= 100 in ALL cells
>>> spc.pp.filter_by_coverage(adata, min_depth=10, max_depth=100)
>>>
>>> # Keep variants with DP >= 10 in at least 80% of cells
>>> spc.pp.filter_by_coverage(adata, min_depth=10, min_fraction=0.8)
>>>
>>> # Keep variants with DP >= 10 in at least 5 cells
>>> spc.pp.filter_by_coverage(adata, min_depth=10, min_cells=5)
>>>
>>> # Filter by alternate allele depth - keep if ANY cell has AD >= 3 (default for AD)
>>> spc.pp.filter_by_coverage(adata, min_depth=3, layer="AD")
>>>
>>> # Filter by AD range: 3 <= AD <= 50 in at least 1 cell
>>> spc.pp.filter_by_coverage(adata, min_depth=3, max_depth=50, layer="AD")
>>>
>>> # Filter by AD - require at least 2 cells with AD >= 3
>>> spc.pp.filter_by_coverage(adata, min_depth=3, min_cells=2, layer="AD")
>>>
>>> # Require both DP >= 10 AND AD >= 3 in at least 1 cell (default for 'both')
>>> spc.pp.filter_by_coverage(adata, min_depth=10, min_alt_depth=3, layer="both")
>>>
>>> # Complex filter: 10 <= DP <= 100 AND 3 <= AD <= 50 in at least 90% of cells
>>> spc.pp.filter_by_coverage(
...     adata, min_depth=10, max_depth=100, min_alt_depth=3, max_alt_depth=50, layer="both", min_fraction=0.9
... )

Notes

This function is useful for ensuring high-quality variant calls by removing sites with insufficient or excessive coverage.

Default behavior by layer:

  • layer=’DP’: Defaults to min_fraction=1.0 (ALL cells). This is stringent and ensures every cell has adequate total depth at retained variants, which is important for quality control.

  • layer=’AD’ or ‘both’: Defaults to min_cells=1 (ANY cell). This is appropriate for variant detection, as you typically want to retain variants that appear in at least one cell with sufficient alternate allele support.

Using max_depth:

Setting a maximum depth threshold helps filter out problematic sites with abnormally high coverage, which may indicate: - PCR duplicates - Mapping artifacts (e.g., reads mapping to repetitive regions) - Copy number variations or collapsed repeats - Sequencing errors or technical artifacts

For example, if median coverage is ~30x, sites with >100x coverage may be problematic. Use max_depth to exclude these outliers.

Filtering modes:

  • DP mode: Filters based on total sequencing depth. Useful for ensuring adequate coverage regardless of genotype.

  • AD mode: Filters based on alternate allele depth only. Useful when you want to retain variants with sufficient evidence for the variant allele in at least some cells. Default behavior (min_cells=1) keeps variants present in any cell.

  • both mode: Applies both filters simultaneously. Useful when you need both adequate total coverage AND sufficient alternate allele support. For example, requiring DP >= 10 and AD >= 3 ensures good coverage overall while also requiring at least 3 reads supporting the variant.

Choosing min_cells vs min_fraction:

  • Use min_cells when you want “at least N cells” regardless of dataset size (e.g., “at least 3 cells” works the same for 10-cell or 1000-cell datasets)

  • Use min_fraction when you want a proportion (e.g., “at least 10% of cells” scales with dataset size)

Requires the ‘DP’ layer (and ‘AD’ layer if using layer=’AD’ or layer=’both’) in adata.layers, which are automatically created by spc.pp.load_vcf() when loading VCF files.