The Agentic Omics Vision: LLMs Meet Domain-Specific AI

Introduction: The Convergence Point

In Post 13, we defined agentic AI as systems that autonomously plan, reason, use tools, and execute multi-step scientific workflows. Now we arrive at the central thesis of this entire series: Agentic Omics — the convergence of large language model (LLM) reasoning with domain-specific biological AI models like AlphaFold, ESM, scGPT, and DNABERT to create autonomous systems capable of end-to-end biological discovery.

This is not science fiction. As of early 2026, agentic systems are being deployed in operational drug discovery settings at companies like AstraZeneca, with documented implementations compressing workflows that once took months into hours while maintaining scientific traceability (Seal et al., 2025). The question is no longer if this convergence will transform biology, but how — and what architecture will get us there most reliably.

In this post, we articulate the Agentic Omics vision: what it looks like, what exists today versus what remains aspirational, and the technical challenges that must be solved to make it real.

The Architecture: LLM as Brain, Domain Models as Tools

The core insight of Agentic Omics is architectural: use LLMs for what they do well (reasoning, planning, natural language interface) and domain-specific models for what they do well (biological prediction, pattern recognition in sequences and structures).

The Orchestrator Pattern

At the heart of Agentic Omics is an LLM serving as an orchestrator — a “brain” that coordinates specialized biological tools. This is not a general-purpose chatbot answering biology questions; it is a reasoning engine that:

Receives a biological query (e.g., “Given this cancer mutation, identify potential drug targets”)
Plans a workflow (decompose into subtasks: structural impact → functional consequence → target identification → molecule design)
Calls appropriate tools (AlphaFold for structure, ESM for embeddings, scGPT for expression analysis, BLAST for homology)
Synthesizes results (integrates outputs, checks for consistency, identifies gaps)
Iterates or concludes (requests additional analysis or delivers final answer with confidence assessment)

This pattern mirrors successful agentic systems in chemistry like ChemCrow (Bran et al., 2024), which orchestrates 13 chemistry tools through an LLM interface, but adapted for the unique challenges of biological data.

Why This Architecture?

The orchestrator pattern addresses a fundamental limitation of both pure LLMs and pure domain models:

LLMs alone hallucinate biological facts, lack quantitative precision, and cannot perform specialized computations (e.g., protein structure prediction, variant effect scoring)
Domain models alone are narrow in scope (AlphaFold predicts structure but cannot reason about drug-likeness; scGPT annotates cell types but cannot design molecules) and lack natural language interfaces

By combining them, we get the reasoning flexibility of LLMs with the quantitative accuracy of domain models. Recent work on modularity in agentic drug discovery systems confirms this approach: tool-calling agents significantly outperform code-generating agents for chemistry and biology workflows, with Claude-3.7-Sonnet and GPT-4o showing strongest performance in orchestrating biological tools (Seal et al., 2025).

Concrete Workflow Examples

To make the Agentic Omics vision concrete, let us walk through three representative workflows that illustrate the architecture in action.

Workflow 1: Cancer Mutation to Drug Target

Input: A somatic mutation identified in a tumor sample (e.g., KRAS G12C)

Agent workflow:

Structural impact assessment: Call AlphaFold 3 to predict the mutant protein structure and compare with wild-type. AlphaFold 3 can now model protein complexes with DNA, RNA, and ligands, enabling assessment of how the mutation affects binding interfaces (Abramson et al., Nature 2024).
Functional consequence prediction: Use ESM-3 to generate embeddings for both wild-type and mutant sequences, then compute the embedding distance as a proxy for functional disruption. ESM-3’s 15B parameters capture evolutionary constraints that indicate which mutations are likely pathogenic (EvolutionaryScale, 2024).
Pathway context: Query pathway databases (KEGG, Reactome) via tool calls to identify downstream effects of the mutation. The LLM synthesizes this into a narrative: “KRAS G12C locks the protein in its active GTP-bound state, driving constitutive MAPK signaling.”
Target identification: Search for druggable pockets in the mutant structure using computational docking tools. The agent identifies the switch-II pocket as a mutation-specific binding site.
Candidate molecule retrieval: Query existing inhibitor databases (ChEMBL, DrugBank) for compounds targeting this pocket. If none exist, invoke RFdiffusion or ProGen2 to generate de novo binders.
ADMET prediction: Run predicted candidates through toxicity, solubility, and bioavailability models to filter for drug-like properties.

Output: A ranked list of candidate molecules with predicted binding affinities, structural visualizations, and ADMET profiles — all generated autonomously with the LLM providing natural language explanations at each step.

This workflow, which would take a human researcher weeks to execute manually (coordinating multiple software tools, interpreting outputs, documenting results), can be completed in hours by an agentic system.

Workflow 2: Single-Cell Multi-Omics Analysis

Input: Single-cell RNA-seq data from a disease cohort (e.g., tumor microenvironment)

Agent workflow:

Quality control: Run automated QC checks (mitochondrial gene percentage, doublet detection) and report issues to the user.
Cell type annotation: Use scGPT to annotate cell types based on the 33M+ cell pretraining (Cui et al., Nature Methods 2024). The LLM interprets the annotations: “The tumor microenvironment shows enrichment of exhausted CD8+ T cells and M2 macrophages, consistent with an immunosuppressive phenotype.”
Differential expression: Identify genes differentially expressed between conditions using standard tools (DESeq2, Scanpy), with the LLM summarizing key findings.
Trajectory inference: Apply RNA velocity or CellRank to infer developmental trajectories. The agent identifies a differentiation path from progenitor cells to exhausted T cells.
Perturbation prediction: Use Geneformer to predict how specific perturbations (e.g., checkpoint inhibitor treatment) would alter the transcriptomic state (Theodoris et al., Nature 2023).
Integration with genomics: If matched genomic data is available, correlate mutations with expression changes to identify driver events.

Output: A comprehensive analysis report with cell type compositions, key differentially expressed genes, inferred trajectories, and treatment response predictions — all with the LLM providing biological interpretation at each step.

Workflow 3: Literature-Guided Hypothesis Generation

Input: A broad biological question (e.g., “What are promising targets for Alzheimer’s disease?”)

Agent workflow:

Literature search: Query PubMed, Semantic Scholar, and bioRxiv for recent papers on Alzheimer’s disease targets. Use NLP to extract key entities (genes, pathways, compounds) and relationships.
Evidence synthesis: Cluster findings by target, assess consistency across studies, and identify consensus targets versus controversial ones. The LLM generates a structured summary: “APOE4 remains the strongest genetic risk factor, but recent work implicates TREM2, CD33, and complement pathway genes in microglial dysfunction.”
Target validation check: For top candidates, query databases (OpenTargets, DisGeNET) for genetic and experimental evidence scores.
Druggability assessment: Check if targets have known ligands, structural data, or are amenable to specific modalities (small molecule, antibody, PROTAC).
Hypothesis generation: Propose specific, testable hypotheses: “TREM2 agonists may enhance microglial clearance of amyloid-beta in early-stage AD patients with APOE4 genotype.”
Experimental design suggestion: Outline experiments to test the hypothesis (e.g., “Treat APOE4 iPSC-derived microglia with TREM2 agonist, measure amyloid-beta phagocytosis via flow cytometry”).

Output: A literature-backed hypothesis with supporting evidence, identified gaps, and suggested experiments — essentially an AI-generated grant proposal introduction.

What Exists Today vs. What’s Aspirational

It is critical to be honest about the gap between the vision and current reality. Agentic Omics is not fully realized — but significant pieces are operational today.

Today (2026): Proven Capabilities

Tool orchestration works. Systems like ChemCrow demonstrate that LLMs can reliably call chemistry tools and synthesize results. Early drug discovery implementations at AstraZeneca show agentic systems integrated into operational pipelines (Seal et al., 2025).
Domain models are mature. AlphaFold 3, ESM-3, scGPT, and DNABERT-2 are production-ready with well-documented APIs. These models provide reliable predictions within their domains.
LLM reasoning is sufficient for many workflows. Claude-3.7-Sonnet, GPT-4o, and similar models can plan multi-step biological workflows, interpret results, and generate coherent reports (Seal et al., 2025).
Specific workflows are automated. Literature review agents, automated protocol generation, and toxicity prediction agents are deployed in real-world settings with quantifiable time savings.

Near-Term (2026-2027): Emerging Capabilities

End-to-end target-to-molecule pipelines. Several companies are building integrated systems that go from target identification through lead optimization autonomously. Insilico Medicine’s ISM001-055 (an AI-designed drug for idiopathic pulmonary fibrosis) has shown positive Phase IIa results, demonstrating clinical viability (ScienceDirect, 2025).
Multi-agent collaboration. Systems where specialized agents (genomics agent, proteomics agent, chemistry agent) collaborate on complex problems are in development. The “AI lab meeting” concept — agents critiquing each other’s hypotheses — is being explored.
Closed-loop experimentation. Integration with robotic platforms for autonomous experiment execution is advancing, though primarily in chemistry and materials science. Biology lags due to greater experimental complexity.

Aspirational (2028+): Open Challenges

True causal reasoning. Current agents excel at correlation and pattern recognition but struggle with causal inference. Determining whether a gene causes a phenotype versus merely correlating with it remains challenging.
Cross-omics integration at scale. While individual omics AI is mature, seamlessly integrating genomics, transcriptomics, proteomics, and metabolomics in a single agentic workflow is not yet routine.
Clinical deployment. Regulatory approval for fully autonomous AI-driven clinical decision-making remains distant. Human oversight will be required for the foreseeable future.
Generalization across biological contexts. Models trained on specific cell types, organisms, or disease contexts often fail to generalize. Agentic systems need better mechanisms for recognizing when they are operating outside their training distribution.

Technical Challenges

Building Agentic Omics systems requires solving several hard technical problems.

API Orchestration and Error Handling

Biological tools fail differently than software APIs. AlphaFold may return a low-confidence prediction; BLAST may find no homologs; scGPT may be uncertain about cell type annotation. Agentic systems must:

Handle partial results gracefully
Recognize when predictions are unreliable (e.g., AlphaFold pLDDT scores < 70)
Fall back to alternative methods when primary tools fail
Communicate uncertainty to users clearly

This requires sophisticated error handling beyond standard try-catch patterns — agents need to understand why a tool failed and whether to retry, switch tools, or report the limitation.

Hallucination in Biological Contexts

LLM hallucination is dangerous in biology. A fabricated gene-disease association or incorrect protein function could misdirect months of experimental work. Mitigation strategies include:

Grounding: Require agents to cite specific database entries or paper DOIs for factual claims
Verification chains: Have a second agent or tool verify critical claims
Confidence scoring: Agents should report confidence levels and flag low-confidence assertions
Human-in-the-loop checkpoints: Require human approval for high-stakes decisions (e.g., selecting a clinical candidate)

Recent work on agentic systems emphasizes auditability and clear human accountability as operational requirements, not optional features (Ardigen, 2026).

Computational Cost

Running AlphaFold, ESM-3, and scGPT in sequence is computationally expensive. A single AlphaFold 3 prediction can take hours on a GPU; ESM-3 inference for long sequences is similarly costly. Agentic workflows that call these models multiple times can quickly become prohibitively expensive.

Solutions include:

Caching: Store and reuse predictions for common queries
Approximate methods: Use faster approximate models for initial screening, reserve accurate models for final validation
Batching: Process multiple queries in parallel where possible
Tiered architecture: Simple queries handled by lightweight models; complex queries escalate to heavy models

Data Heterogeneity and Integration

Biological data comes in wildly different formats: FASTQ for sequencing, PDB for structures, AnnData for single-cell, SMILES for molecules. Agentic systems must:

Parse and normalize diverse data formats
Handle missing modalities (e.g., genomics without proteomics)
Correct for batch effects across datasets
Maintain provenance (track where each piece of data came from)

This is fundamentally a data engineering challenge, but it must be solved for agentic systems to work reliably.

The Scientist in the Loop

A critical design principle for Agentic Omics: full autonomy is neither desirable nor achievable for clinical applications. The goal is not to replace scientists but to amplify them.

Why Human Oversight Matters

Accountability. When an AI system recommends a drug candidate that fails in clinical trials, who is responsible? Clear human accountability is essential for regulatory approval and ethical deployment (Ardigen, 2026).
Creative insight. AI excels at pattern recognition and optimization within defined spaces. Humans excel at asking novel questions, recognizing unexpected connections, and making creative leaps. The best systems combine both.
Edge cases. AI systems fail on distributional shifts — new organism, novel disease mechanism, unprecedented modality. Humans recognize when they are in unfamiliar territory and adjust accordingly.
Ethical judgment. Decisions about patient selection, risk-benefit tradeoffs, and research priorities require ethical reasoning that extends beyond computational optimization.

Practical Implementation: Human-on-the-Loop

Rather than human-in-the-loop (requiring approval at every step) or full autonomy (no human involvement), Agentic Omics systems should implement human-on-the-loop:

Agents operate autonomously for routine tasks
Humans set goals, review results, and intervene when needed
Agents flag uncertain or high-stakes decisions for human review
Humans can inspect agent reasoning traces and audit decisions

This balances efficiency with oversight, allowing agents to compress routine work while keeping humans engaged for creative and ethical decisions.

Case Study: ChatInvent at AstraZeneca

A concrete example of Agentic Omics in practice is ChatInvent, an agentic system integrated into AstraZeneca’s discovery pipeline (ScienceDirect, 2026). Key features:

Evolution: Started as a proof-of-concept single agent, evolved into an extensible multi-agent architecture with a graphical user interface
Capabilities: Molecular design, synthesis planning, retrosynthesis, property prediction
Impact: Compresses molecular design cycles from weeks to hours, with chemists reporting improved productivity and creativity
Architecture: LLM orchestrator calling chemistry tools (RDKit, molecular generators, property predictors) with human chemists reviewing and refining outputs

ChatInvent demonstrates that agentic systems can deliver real-world value today — not as replacements for scientists, but as productivity multipliers that handle routine computation while humans focus on strategic decisions.

Open Source vs. Proprietary: Implications for Agentic Omics

The Agentic Omics vision depends on access to domain-specific models. Here the landscape is mixed:

Open models enabling agentic development:

ESM-2/ESM-3 (Meta): Open weights and inference code
DNABERT-2: Open source
scGPT: Open source
Evo (Arc Institute): Open weights
AlphaFold 3: Code released after initial controversy (Abramson et al., 2024)

Proprietary or restricted:

Isomorphic Labs models: Internal to Google/Isomorphic
Many pharma AI tools: Proprietary, not accessible externally
Some commercial platforms: API access only, no local deployment

This creates a tension: open models enable research and democratization, but proprietary models may have superior performance or specialized capabilities. The Agentic Omics community should advocate for open access to foundational biological models while respecting legitimate safety and commercial concerns.

Conclusion: The Path Forward

The Agentic Omics vision — LLMs orchestrating domain-specific biological AI models — is not speculative. It is being built today, with operational deployments in drug discovery and expanding applications across biology.

What to do now:

Start with well-defined workflows. Do not attempt full autonomy immediately. Identify specific, bounded workflows (e.g., variant interpretation, cell type annotation) and build agents for those.
Measure rigorously. Track time savings, accuracy, and user satisfaction. Publish results — positive and negative — to advance the field.
Prioritize interpretability. Agents should explain their reasoning, cite sources, and communicate uncertainty. Black-box autonomy will not gain trust.
Engage with regulators early. For clinical applications, involve regulatory experts from the start. Understand what validation is required.
Contribute to open infrastructure. Support open models, open datasets, and open agentic frameworks. The field advances faster when tools are accessible.

The next posts in this series will dive deeper into specific applications: drug discovery (Post 16), cancer genomics (Post 17), and multi-agent systems (Post 18). But the foundational architecture — LLM orchestrator + domain tools + human oversight — remains constant.

Agentic Omics is not about replacing biologists. It is about giving them superpowers: the ability to query the sum of biological knowledge, run complex analyses in hours instead of weeks, and focus their creativity on the questions that matter most.

Glossary

Term	Definition
Agentic AI	AI systems that autonomously plan, reason, use tools, and execute multi-step workflows, as opposed to passive question-answering systems
LLM Orchestrator	A large language model that coordinates multiple specialized tools, planning workflows and synthesizing results
Tool Calling	The mechanism by which an LLM invokes external functions or APIs, passing arguments and receiving results
Domain-Specific Model	AI models trained on specific biological data types (e.g., AlphaFold for protein structure, scGPT for single-cell transcriptomics)
Human-on-the-Loop	An autonomy model where AI operates independently but humans monitor, set goals, and intervene when needed
ADMET	Absorption, Distribution, Metabolism, Excretion, and Toxicity — key properties evaluated in drug development
pLDDT	Predicted Local Distance Difference Test — AlphaFold’s per-residue confidence score (0-100, higher is more confident)
Retrosynthesis	The process of working backward from a target molecule to identify feasible synthesis routes

References

Abramson, J., et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630, 493-500. https://doi.org/10.1038/s41586-024-07487-w
Seal, S., et al. (2025). AI Agents in Drug Discovery. arXiv preprint arXiv:2510.27130. https://arxiv.org/abs/2510.27130
Gridach, M., et al. (2025). Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions. arXiv preprint arXiv:2503.08979. https://arxiv.org/abs/2503.08979
Bran, A. M., et al. (2024). ChemCrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376. https://arxiv.org/abs/2304.05376
Cui, H., et al. (2024). scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nature Methods, 21, 1470-1481. https://doi.org/10.1038/s41592-024-02201-0
Theodoris, C. V., et al. (2023). Transfer learning enables predictions in network biology. Nature, 618, 616-624. https://doi.org/10.1038/s41586-023-06139-9
EvolutionaryScale. (2024). ESM-3: Open foundation models for protein design. https://www.evolutionaryscale.ai
ScienceDirect. (2026). Democratising real-world drug discovery through agentic AI. Drug Discovery Today. https://doi.org/10.1016/j.drudis.2026.000103
Ardigen. (2026). AI in Biotech: Lessons from 2025 and the Trends Shaping Drug Discovery in 2026. https://ardigen.com/ai-in-biotech-lessons-from-2025-and-the-trends-shaping-drug-discovery-in-2026/
Drug Target Review. (2026). AI in drug discovery: predictions for 2026. https://www.drugtargetreview.com/article/192962/ai-in-drug-discovery-predictions-for-2026/

Introduction: The Convergence Point#

The Architecture: LLM as Brain, Domain Models as Tools#

The Orchestrator Pattern#

Why This Architecture?#

Concrete Workflow Examples#

Workflow 1: Cancer Mutation to Drug Target#

Workflow 2: Single-Cell Multi-Omics Analysis#

Workflow 3: Literature-Guided Hypothesis Generation#

What Exists Today vs. What’s Aspirational#

Today (2026): Proven Capabilities#

Near-Term (2026-2027): Emerging Capabilities#

Aspirational (2028+): Open Challenges#

Technical Challenges#

API Orchestration and Error Handling#

Hallucination in Biological Contexts#

Computational Cost#

Data Heterogeneity and Integration#

The Scientist in the Loop#

Why Human Oversight Matters#

Practical Implementation: Human-on-the-Loop#

Case Study: ChatInvent at AstraZeneca#

Open Source vs. Proprietary: Implications for Agentic Omics#

Conclusion: The Path Forward#

Glossary#

References#

Introduction: The Convergence Point

The Architecture: LLM as Brain, Domain Models as Tools

The Orchestrator Pattern

Why This Architecture?

Concrete Workflow Examples

Workflow 1: Cancer Mutation to Drug Target

Workflow 2: Single-Cell Multi-Omics Analysis

Workflow 3: Literature-Guided Hypothesis Generation

What Exists Today vs. What’s Aspirational

Today (2026): Proven Capabilities

Near-Term (2026-2027): Emerging Capabilities

Aspirational (2028+): Open Challenges

Technical Challenges

API Orchestration and Error Handling

Hallucination in Biological Contexts

Computational Cost

Data Heterogeneity and Integration

The Scientist in the Loop

Why Human Oversight Matters

Practical Implementation: Human-on-the-Loop

Case Study: ChatInvent at AstraZeneca

Open Source vs. Proprietary: Implications for Agentic Omics

Conclusion: The Path Forward

Glossary

References