Genomics & WGS

Variant Calling: WGS and Targeted Sequencing
with GATK4 and DeepVariant

BioMate automatically selects the optimal variant caller for your sequencing data, runs nf-core/sarek on AWS Batch, and returns annotated, audit-ready VCF results with Gold/Silver/Bronze QC grading — no bioinformatics infrastructure required.

Try free See all genomics services
Caller selection

GATK4 vs DeepVariant — BioMate chooses for you

The best variant caller depends on your sequencing technology, cohort size, and research context. BioMate reads your data characteristics and routes automatically — you never need to pick manually.

GATK4 HaplotypeCaller

Illumina short-read Population studies VQSR
  • Gold standard for Illumina WGS and WES germline calling
  • GVCF mode for joint genotyping across large cohorts
  • VQSR for variant quality score recalibration (cohorts ≥30)
  • Mutect2 for somatic tumor-normal calling
  • Clinical and population genetics preferred choice

DeepVariant

PacBio HiFi ONT Deep learning
  • Best-in-class accuracy for PacBio HiFi and Oxford Nanopore long reads
  • Deep learning model trained on real sequencing data — no hard-coded heuristics
  • Outperforms GATK4 on non-Illumina platforms in benchmarks
  • Also available for Illumina data when preferred by protocol
  • GPU-accelerated inference on AWS Batch for fast turnaround

BioMate routes automatically

Describe your data — technology, sample count, germline or somatic — and BioMate selects the caller, sets VQSR vs. hard-filter mode based on cohort size, and confirms the configuration before running. No need to understand the tradeoffs yourself.

Supported workflows

Every variant calling use case

BioMate covers germline and somatic calling, targeted panels, and structural variant detection — all within nf-core/sarek.

WGS Germline

SNP and indel calling from whole-genome sequencing. GATK4 GVCF mode for joint genotyping. ACMG/AMP variant classification with ClinVar annotation.

Somatic Variant Calling

Tumor-normal matched or tumor-only somatic SNV and indel detection with GATK4 Mutect2. Somatic CNV calling with CNVKit or GATK4 ModelSegments.

Targeted Panel

Hybrid-capture and amplicon panel sequencing. BED-file-based target restriction, higher coverage QC thresholds, and panel-specific hard filters for small cohorts.

CNV & SV Calling

Copy number variant detection via GATK4 gCNV and Manta. Structural variants (deletions, inversions, translocations) via TIDDIT. Results visualized as circos plots.

nf-core/sarek integration

Built on the community standard for cancer genomics

nf-core/sarek is the peer-reviewed, community-maintained Nextflow pipeline for germline and somatic variant analysis — trusted by cancer genome atlases, biobanks, and clinical research programs worldwide.

BioMate orchestrates sarek on AWS Batch, handles reference genome download, supplies the correct dbSNP and Mills-Gold-Standard indel resources for BQSR, and streams step-by-step progress back to the UI in real time.

  1. Preprocessing BWA-MEM2 alignment, duplicate marking (GATK MarkDuplicates), and base quality score recalibration (BQSR).
  2. Variant calling GATK4 HaplotypeCaller or DeepVariant — selected automatically. Somatic: Mutect2, Strelka2, or Manta.
  3. Annotation & QC VEP or ANNOVAR annotation, ACMG classification, VQSR or hard filtering, and Gold/Silver/Bronze QC grading.
Pipeline outputs
  • VCFAnnotated variant calls (PASS filter)
  • TSVTabulated variant table with consequence & ACMG class
  • HTMLMultiQC report with coverage and TS/TV metrics
  • BEDCNV segment calls with log2 copy ratio
  • DOCXMethods report with parameters & tool versions
FAQ

Common questions about variant calling in BioMate

Does BioMate support both SNV and indel calling?

Yes. GATK4 HaplotypeCaller and DeepVariant both call SNPs and indels from WGS or targeted panel data. BioMate applies VQSR or hard filtering as appropriate based on cohort size and data type. Structural variants are detected by TIDDIT and Manta within the nf-core/sarek pipeline.

What reference genomes are supported?

GRCh38 (hg38), GRCh37 (hg19), and GRCm39 (mouse mm39) are supported out of the box. Custom reference genomes can be uploaded for non-model organisms or assembly versions not in the standard index. BioMate auto-downloads the relevant dbSNP and gnomAD annotation databases for the selected genome.

How does BioMate handle somatic variant calling?

BioMate routes somatic samples to GATK4 Mutect2 within nf-core/sarek, with matched tumor-normal pairs supported. Somatic VCFs are filtered with FilterMutectCalls and annotated with Funcotator or VEP. Tumor-only mode is available when no matched normal is available, with appropriate caveats noted in the QC report.

What QC thresholds are applied?

GATK VQSR tranche thresholds (99% sensitivity for SNPs, 99% for indels) and ENCODE-standard coverage requirements (30x minimum for WGS germline, 100x for somatic). Results are graded Gold (meets all thresholds), Silver (meets minimum coverage but borderline VQSR), or Bronze (below recommended coverage). Every threshold links to the GATK Best Practices documentation.

Get started

Run variant calling on your WGS data

Describe your sequencing data in plain English. BioMate selects GATK4 or DeepVariant, runs nf-core/sarek, and returns annotated VCF results with QC grades.

Try free →