Omics

Abstract digital art representing AI model evaluation, with glowing rulers and glowing biological structures like DNA and proteins intersecting with neural network nodes.

Benchmarks and Evaluation: How Do We Know If Omics AI Actually Works?

When a new foundation model in computational biology is released, the accompanying paper inevitably features tables of bolded numbers demonstrating state-of-the-art performance. Whether it is predicting protein structures or annotating single-cell data, the claims are often spectacular. But how do we truly know if these AI systems work in ways that matter to biology, rather than just optimizing arbitrary computational metrics? For the vision of Agentic Omics to become reality—where autonomous agents orchestrate models like AlphaFold and DNABERT-2 to drive drug discovery—we need a rigorous understanding of when these models succeed, when they hallucinate, and when their benchmarks deceive us. Claims of AI breakthroughs are only as strong as their evaluation methodologies. ...

Futuristic digital illustration of biological data infrastructure

The Data Infrastructure Challenge: From Raw Reads to AI-Ready Datasets

The bottleneck for AI in computational biology is rarely a shortage of sophisticated models; it is the sheer difficulty of making biological data AI-ready. The “Agentic Omics” vision—where autonomous AI agents orchestrate domain-specific models to accelerate drug discovery—fundamentally rests on the assumption that these agents have access to standardized, clean, and computable data. In this post, we explore the unglamorous but critical foundation of omics AI: the data infrastructure. We trace the journey from raw sequencing reads to the structured tensor formats required by modern foundation models, exploring the evolving standards, the scale of the challenge, and how cloud infrastructure is adapting. ...

A transformer model reading DNA, RNA, proteins, and single-cell profiles as linked biological languages

Foundation Models Meet Biology: The Transformer Revolution in Life Sciences

In the first post of this series, we mapped the omics landscape: genomics, transcriptomics, proteomics, metabolomics, metagenomics, phenomics. The next question is obvious: why did AI suddenly get so good at several of these fields at once? The short answer is that biology turned out to be unusually compatible with the same family of models that transformed natural language processing. DNA, RNA, proteins, and even single-cell expression matrices are not “language” in any literal sense, but they are structured symbol systems with long-range dependencies, rich context, and vast quantities of unlabeled data. That is exactly the setting where self-supervised foundation models thrive. ...

Map of the omics layers interconnected by glowing data lines

The Omics Revolution: A Map of the Territory

Welcome to the first installment of Agentic Omics: When AI Reads the Book of Life. In this 24-part series, we will systematically review the state of the art of Artificial Intelligence (AI) across all major omics disciplines. We will explore how large language models, foundational transformer architectures, and eventually fully autonomous “Agentic Omics” systems are orchestrating domain-specific models to accelerate drug discovery, personalized medicine, and our fundamental understanding of biology. ...