Are public bioinformatics agent skills good enough to run an analysis?

They have improved — the better libraries now ship a genuine R/Bioconductor DESeq2 skill with citations and a troubleshooting table, and a few run continuous integration. But that CI validates the document (frontmatter fields, section headings, a valid date), not whether the code runs. No public skill library we examined executes its DESeq2 code against a real count matrix in CI, so correctness still depends on the agent assembling, versioning, and running it.

What is the difference between an agent skill and a BioMate workflow?

An agent skill is a Markdown file (SKILL.md) describing how to do a task for an AI agent to read. A BioMate workflow is executable, container-pinned code that has been run end to end on real input with the output verified. A skill tells an agent how the analysis goes; BioMate runs it and returns the result.

Do agent skills pin tool versions and ship a runnable environment?

Generally no. A skill typically states version compatibility as guidance (for example 'pydeseq2 0.4+') rather than pinning an exact environment or shipping a per-skill container; the agent resolves Bioconductor and system libraries at runtime. BioMate workflows run in dependency-complete, version-pinned containers, so there is no version-mismatch hunt.

Bioinformatics Agent Skills vs. BioMate: An Honest Comparison

The public "agent skill" libraries for bioinformatics got genuinely better this year. A year ago the only differential-expression skill on offer was a thin wrapper around a partial Python re-implementation of DESeq2; today the strongest libraries ship a real R/Bioconductor DESeq2 skill — with citations, a troubleshooting table, and even continuous integration. We went back and read them. Here is what they give you, what they still leave to you, and the single line that separates a skill from a result.

First, credit where it's due

We last surveyed this ecosystem in mid-2026 and concluded that no public library shipped a genuine R DESeq2 skill — only a ~160-line wrapper around PyDESeq2, a partial Python re-implementation of the canonical R package, with no troubleshooting and no citations. That is no longer true, and it's worth saying plainly.

The better collections today ship a real DESeq2 skill written in actual R — hundreds of lines that call library(DESeq2), build a DESeqDataSetFromMatrix, estimate dispersions, fit the model, apply apeglm log-fold-change shrinkage, handle multi-factor and interaction designs, and close with a troubleshooting table and DOIs to the methods papers (Love 2014; Zhu 2019). Some libraries now offer the R and Python variants side by side and cross-link them. A few run automated checks on every skill on each commit. If you used these a year ago and walked away, they are meaningfully more rigorous now.

So this isn't a takedown. It's a question worth asking precisely because the prose got good: does a better-written skill actually get you a correct result — or just a better-written description of one?

What an "agent skill" actually is

A skill is a Markdown file — a SKILL.md — that an AI agent reads to learn how to perform a task. A good one has an overview, when-to-use guidance, prerequisites, the ordered steps, key parameters, a troubleshooting section, and references. Many ship a folder of example .R / .py / .sh scripts beside the prose.

That is a real, useful artifact. It is also, fundamentally, documentation written for an agent to read — not a pipeline that has been run. And that distinction turns out to be the whole story.

What "validated" means inside a skill library

The most reassuring development is that a few leading libraries now run continuous integration on every skill — which sounds like the gap just closed. So read what the CI actually checks.

It validates the document: that the YAML frontmatter has a name and a license, that the required section headings are present, that the category matches the directory it lives in, that the date is a valid ISO date. Useful hygiene. But it does not run the R. In every public library we examined, no skill executes its DESeq2 code against a real count matrix and checks the result. The validation is structural — it confirms the recipe is well-formatted, not that the dish comes out of the oven.

A skill can pass every CI check and still contain code that is subtly wrong: a stale argument, a function renamed two releases ago, a data-contract mismatch between step three's output and step four's input. Structure is checked. Execution is not.

We have measured exactly this failure surface. In a 20-task Bioconductor benchmark, an ungrounded model cited real functions only 71.4% of the time; checking every function against the package's NAMESPACE lifts that to 88.2% — and yet even a workflow that passes every structural check can still fail to run, because versions drift and data contracts don't line up. (We wrote that up separately: why AI writes Bioconductor code that doesn't run.) A skill's CI lives one level below even structure grounding — it checks the Markdown, not the code.

The five things a skill still leaves to you

A skill describes the analysis. To get a result, you — or the agent acting on the skill — still have to:

Pin the versions. "Version compatibility: pydeseq2 0.4+" tells the agent to check, not what to install. There is no per-skill locked environment.
Build the environment. No skill ships a container or a lockfile per task. You resolve Bioconductor, its annotation databases, and the system libraries yourself.
Stage the data into the exact shape the next step expects. The input contract is prose, not a validated schema.
Carry outputs across steps. Step three's object has to match step four's input — and nothing checks that it does.
Be right. Because the code was never executed against real data, correctness rests on the author and the agent — exactly the three-in-ten failure surface above.

None of this is a criticism of the authors; it is the category. A skill is documentation for an agent. Documentation is not execution.

What BioMate does instead

BioMate is the execution layer. You describe the analysis in plain language, and BioMate routes only to workflows that have actually been run — end to end, in a dependency-complete container, on real input, with the output verified. For DESeq2 specifically, BioMate doesn't hand you a description of lfcShrink; it runs the whole pipeline — DESeqDataSetFromMatrix, dispersion estimation, the model fit, apeglm shrinkage — in a container where DESeq2 and apeglm are already resolved, on managed compute, and returns the results table, the shrunk fold-changes, the diagnostic plots, and an audit trail of exactly what ran. If a dependency were missing, the workflow is fixed and re-validated before it is ever offered — it doesn't fail in your hands.

A skill describes the analysis; a BioMate workflow is the analysis, executed. The columns aren't better and worse versions of the same thing — they're different categories.
	Public agent skill	BioMate workflow
What it is	A `SKILL.md`: prose + an example snippet for an agent to read	Executable workflow code (Nextflow + R), pinned to a container image
"Validation" means	CI checks the document — headings, frontmatter, date	Run end to end on real input; the output is verified
Versions	Guidance ("0.4+") — not pinned	Exact image, pinned and reproducible
Environment	You build it	Already dependency-complete
Step-to-step data contract	Described in prose	Carried and checked by the pipeline
Run it by asking	No — an agent assembles it and hopes	Yes — plain-English request → result
Provenance	None	An audit trail of exactly what ran
If it would fail	It fails in your hands, at runtime	It's fixed and re-validated before it's offered

This is grounded in the same work as our open knowledge base, BioMate-KB — 15,641 Bioconductor workflow steps, NAMESPACE-validated, linked to container images and software DOIs, and, for the validated head of the collection, confirmed by real end-to-end execution. The public bundle covering the top 200 packages is free under CC-BY-4.0; clone it and build on it: github.com/bioMate-AI/biomate-bioconductor-kb.

When a skill is the right tool

To be fair to the skill libraries: they are good for what they are. If you are an expert who wants a well-organized reference your agent can pull how-to guidance from — parameter hints, a troubleshooting table, the canonical citation — a good skill is genuinely useful, and the better libraries are worth reading. The honest line is not "skills are bad." It's that a skill and a validated, executed workflow are different categories: one tells an agent how the analysis goes; the other is the analysis, already run, ready to return a result. If what you want is a correct result without spending the afternoon resolving Bioconductor versions and chasing a data-contract mismatch, that difference is the product.

BioMate is a life-sciences AI platform that turns validated computational workflows into something you can run by asking. The skill ecosystem is getting better at telling an agent how an analysis goes. BioMate is the part that actually runs it.

The takeaway

Bioinformatics agent skills improved in 2026 — real R DESeq2, citations, even CI. But that CI validates the document, not the run; the code is never executed against real data. A skill tells an agent how an analysis goes. BioMate runs it — in a dependency-complete container, on real input, with the output verified — and returns the result.