Platform Evaluation

BioMate AI — Platform Benchmarks

Independent evaluation of BioMate's workflow routing, pharmacokinetic modeling, QC grading, and regulatory compliance capabilities. All numbers are from internal benchmarks run against published reference datasets.

Routing & Reliability

Workflow Routing & Reliability

Measured across all 37 biological domains using held-out queries not seen during system development. Each metric is tested with a minimum of 3 independent repetitions.

94.6%
Workflow routing accuracy
120 test cases across all domains
Target ≥80%
PASS
100%
Cross-run stability
Cohen’s Kappa = 1.0
0% flip rate across 3 independent runs
PASS
98.2%
License gating accuracy
54 test cases
Target ≥90% — correctly blocks unlicensed workflows
PASS
97.5%
Prerequisite recovery
40 test cases
Target ≥85% — correctly recovers missing upstream steps
PASS
Pharmacokinetics

PBPK Pharmacokinetic Validation

Validated against FDA first-in-human guidance datasets using 15 reference compounds. All predictions fall within the FDA-accepted 2-fold accuracy window.

Compound Prediction Error FDA Standard Result
Midazolam 0.4% FDA FIH 2005 ✓ Pass
Lorazepam 0.7% FDA FIH 2005 ✓ Pass
Gabapentin 0.9% FDA FIH 2005 ✓ Pass
Metformin 9.8% FDA FIH 2005 ✓ Pass
Theophylline 17.5% FDA FIH 2005 ✓ Pass
Atenolol 34.1% FDA FIH 2005 ✓ Pass
All 15 compounds 100% pass rate ✓ Pass

All predictions within FDA-accepted 2-fold accuracy window. Validated against FDA first-in-human guidance datasets.

Quality Control

QC Gate Coverage — 26 Validated Metrics

Every biological domain handled by BioMate has quantitative pass/fail thresholds derived from published community standards. Gates are independently thresholded at Gold, Silver, and Bronze levels.

Domain Gates Standards Referenced
Cryo-EM 3 Rosenthal & Henderson 2003
Cryo-ET 3 Rosenthal & Henderson 2003; Hagen 2017
Protein structure 3 Jumper et al. 2021; Tunyasuvunakool 2021
Cancer / somatic variants 3 GATK/Mutect2; Strelka2
LNP formulation 3 USP standards
Population PK 3 nlmixr2/NONMEM guidelines
Drug discovery 2 Le Guilloux 2009; Genheden 2020
High-throughput screening 2 Zhang 1999; Iversen 2006
ADME / PK 2 Obach 1999; FDA FIH 2005
Clinical trial design 1 Liu & Yuan 2015 (BOIN)
ICH safety (S5/S7) 2 ICH S5R3; ICH S7B
Total across 20+ domains 26 ENCODE, GTEx, nf-core, FDA, ICH, and domain-specific literature

Each gate independently thresholded at Gold (all metrics pass), Silver (minor flag), Bronze (below threshold — triggers auto-remediation).

Regulatory AI

Regulatory Document Analysis

Evaluated on a 100-example FDA drug label dataset. BioMate uses Claude Sonnet 4.6 for regulatory language parsing, adverse event detection, and phase-gating compliance checks.

Metric Score Notes
Overall (macro average) 87.1% Claude Sonnet 4.6 on FDA drug label dataset (n=100)
Adversity language detection 100%
Phase gating accuracy 100%
Numeric range compliance 100%
Citation accuracy 76%
Methodology

How benchmarks are run

Benchmarks are run with a minimum of 3 independent repetitions. Routing benchmarks use held-out queries not seen during system development. PBPK validation uses FDA first-in-human reference compounds. Regulatory evaluation uses a 100-example FDA drug label dataset. Results are updated as models and pipelines improve.

Routing benchmark
120 held-out queries · 37 biological domains · 3 independent runs · Cohen’s Kappa reported
PBPK benchmark
15 FDA reference compounds · 2-fold accuracy window · validated against published clinical PK data
QC gate validation
26 gates · 20+ domains · thresholds derived from peer-reviewed community standards
Regulatory LLM eval
n=100 FDA drug labels · macro-averaged scoring · Claude Sonnet 4.6
FAQ

Common questions about BioMate accuracy

What routing accuracy does BioMate achieve?

In cross-domain routing benchmarks across 120 test cases spanning all 37 biological domains, BioMate achieves 94.6% first-pick routing accuracy — exceeding the 80% target. Routing stability is 100% across independent runs (Cohen’s Kappa = 1.0).

How accurate is BioMate’s PBPK modeling?

BioMate’s PBPK simulation achieves a 100% pass rate on 15 FDA-standard reference compounds. Prediction errors range from 0.4% (Midazolam) to 34.1% (Atenolol), all within the 2-fold FDA-accepted accuracy window.

How many QC metrics does BioMate apply?

BioMate applies 26 quantitative QC gates across 20+ biological domains, each with Gold/Silver/Bronze thresholds based on published community standards (ENCODE, GTEx, nf-core, FDA, ICH).

How does BioMate perform on regulatory document analysis?

On a 100-example FDA drug label evaluation dataset, BioMate’s Claude Sonnet 4.6 integration scores 87.1% overall, with 100% accuracy on adverse language detection and phase gating.

Get started

See BioMate’s accuracy on your own data

Start with a free account. No infrastructure to configure, no command line required.

Try free →