Introduction: The Language of the Cell

While genomics maps the static blueprint of life, transcriptomics captures its dynamic execution. If the genome is the dictionary, the transcriptome is the conversation—the precise subset of genes being expressed by a specific cell, at a specific moment, under specific conditions. For decades, bulk RNA sequencing averaged these conversations across millions of cells, giving us a cacophonous blend that masked individual cellular identities. The advent of single-cell RNA sequencing (scRNA-seq) changed everything, allowing us to listen to individual cellular voices.

However, scRNA-seq generated a new problem: massive, noisy, sparse, and high-dimensional data that traditional statistical methods struggled to fully exploit. Enter foundation models. In recent years (2023–2026), the same transformer architectures that mastered human language have been adapted to learn the “language” of gene expression. By treating genes as tokens and a cell’s expression profile as a sentence, models like scGPT and Geneformer are extracting deep biological representations that generalize across tissues, species, and disease states.

This post, the sixth in our Agentic Omics series, explores how AI is revolutionizing transcriptomics. We will critically examine the leading single-cell foundation models, the emerging frontier of spatial transcriptomics, and honestly assess where these models genuinely outperform classical methods versus where they fall short.

1. Single-Cell Foundation Models: The Paradigm Shift

The core premise of a single-cell foundation model is self-supervised learning on massive cellular datasets. Instead of training a model from scratch for every specific task (e.g., cell type annotation in a specific tissue), researchers pre-train a massive transformer on tens of millions of diverse cells. The model learns fundamental gene-gene interactions and regulatory syntax. Once pre-trained, it can be fine-tuned on smaller, specialized datasets for a variety of downstream tasks.

The Pioneers: scGPT and Geneformer

Two models, initially introduced in 2023 and refined through 2025, dominate the landscape:

scGPT (Single-Cell Generative Pre-trained Transformer): Developed by Cui et al. (Nature Methods, 2024) and built on an initial corpus of over 33 million cells, scGPT utilizes a generative pre-training approach. It handles both gene identity and expression level, adapting the transformer architecture to process the non-sequential nature of gene expression (unlike the strict positional sequence of DNA or text, gene expression in a cell is an unordered set of gene-abundance pairs). scGPT has demonstrated strong performance in cell type annotation, batch integration, and predicting the effects of genetic perturbations.
Geneformer: Developed by Theodoris et al. (Nature, 2023) using approximately 30 million single-cell transcriptomes, Geneformer takes a slightly different approach. It ranks genes within each cell by their expression level (relative to background) and feeds this ordered list of genes into a standard transformer model. This rank-based approach makes it highly robust to the technical noise and dropouts characteristic of scRNA-seq. Geneformer has been particularly successful in network inference and identifying candidate therapeutic targets in heart disease.

Beyond the Pioneers: Scale and Specificity

By 2025 and 2026, the field moved beyond these initial models in two directions: massive scale and multimodal integration. Efforts utilizing datasets like the Human Cell Atlas and regional efforts (e.g., the AIDA v2 dataset encompassing hundreds of thousands of immune cells from diverse populations) have sought to pre-train even larger models to capture rare cell states and population-level diversity, mitigating the European-ancestry bias present in early models. Furthermore, architectures have evolved to incorporate graph neural networks (GNNs) alongside transformers, explicitly modeling known biological pathways and protein-protein interactions as inductive biases, rather than forcing the model to learn them entirely from scratch.

2. Key Capabilities and Applications

What can these foundation models actually do that traditional methods (like Seurat or Scanpy pipelines) cannot? The answer lies in transferability and zero-shot/few-shot learning.

Automated Cell Type Annotation

The most immediate utility of models like scGPT is cell type annotation. Traditionally a manual, iterative process of clustering and marker-gene checking, scGPT can accurately annotate cell types in a new dataset in a zero-shot or few-shot manner, leveraging its pre-trained understanding of cellular states. It handles novel batch effects better than classical integration methods because its internal representation of a “T-cell” is robust across different sequencing technologies and tissues.

Perturbation Prediction

Perhaps the most exciting application is in-silico perturbation. Can we predict how a cell’s transcriptome will change if we knock out a specific gene or apply a specific drug? Models like scGPT and specialized variants have shown remarkable ability to predict the outcomes of CRISPR screens (e.g., Perturb-seq data). The model learns the underlying gene regulatory network (GRN) implicitly; when a gene’s token is altered, the attention mechanism propagates this change to predict shifts in co-regulated genes. This capability is foundational for agentic drug discovery workflows, allowing agents to simulate thousands of gene knockouts computationally before committing to expensive wet-lab validation.

Cross-Modality Translation (scTranslator)

Another breakthrough is cross-modality prediction. Technologies like CITE-seq measure both RNA and surface proteins, but they are expensive. Models like scTranslator have been developed to predict the single-cell proteome directly from the transcriptome. By learning the non-linear relationship between RNA abundance and protein translation efficiency across millions of cells, these models allow researchers to “impute” protein levels in standard scRNA-seq datasets, bridging the gap between transcriptomics and proteomics.

3. The Spatial Frontier: Context is Everything

Traditional scRNA-seq requires dissociating tissue, destroying all spatial information. We know what cells are there, but not where they are or who they are communicating with. Spatial transcriptomics (e.g., 10x Genomics Visium, Xenium, MERFISH) solves this, maintaining the 2D or 3D coordinates of transcripts.

However, spatial data is complex. Foundation models are now being adapted to incorporate spatial graphs. Instead of just embedding a cell based on its own transcriptome, spatial models use graph attention networks to embed a cell based on its transcriptome and the transcriptomes of its physical neighbors. Recent advancements in 2025 have seen the emergence of “Spatial Foundation Models” (e.g., GraphST and similar architectures) that explicitly model cell-cell communication. By identifying spatially co-localized ligand-receptor pairs across tissue boundaries (e.g., at the tumor-immune microenvironment interface), these models predict how tumors suppress local immune responses, offering direct targets for novel immunotherapies.

4. An Honest Assessment: Where Models Struggle

Despite the hype, AI for transcriptomics is not a solved problem. It is crucial to maintain scientific rigor and acknowledge the limitations.

The “Language” Analogy Breaks Down

DNA and proteins are true sequences. Gene expression is an unordered “bag of genes” with continuous abundance values. Adapting transformers—which rely heavily on positional encoding—to this data is inherently awkward. While ranking (Geneformer) or specialized gene embeddings (scGPT) work, they are workarounds. We lack an architecture natively designed for set-valued, heavily zero-inflated data.

Hallucinating Biology

Just as LLMs hallucinate facts, transcriptomic models can hallucinate biology. When asked to predict the effect of a perturbation in a cell type it hasn’t seen during pre-training, a model might confidently output a physiologically impossible expression profile. Unlike physics-informed neural networks, most transcriptomic foundation models lack strict biological constraints (e.g., thermodynamic limits, metabolic conservation of mass).

The Evaluation Crisis

As discussed in Post 4, evaluating these models is notoriously difficult. Many models claim state-of-the-art performance on cell-type annotation benchmarks, but classical methods like simple logistic regression or k-nearest neighbors often perform comparably well on standard datasets when properly tuned. The true value of a billion-parameter foundation model only emerges on highly complex tasks like zero-shot perturbation prediction, where classical methods fail entirely. The field is currently wrestling with establishing rigorous benchmarks that genuinely test biological understanding rather than just data memorization.

5. Agentic Workflows in Transcriptomics

How does this fit into the Agentic Omics vision? Transcriptomic foundation models are not end-to-end agents; they are tools that an agent orchestrates.

Imagine an Agentic System tasked with finding a novel target for a treatment-resistant breast cancer. The workflow might look like this:

The Reasoning Agent (LLM) queries the literature and identifies a resistance phenotype.
It calls a Spatial Transcriptomics Tool (e.g., an integrated scGPT/GraphST pipeline), pointing it to a relevant clinical dataset.
The tool identifies a subpopulation of exhausted T-cells localized specifically near the tumor core and extracts their transcriptomic state.
The Agent then tasks the model to run an in-silico perturbation screen: “Iteratively knock out every expressed transcription factor in these T-cells in simulation. Which knockout best reverts the in-silico expression profile from ’exhausted’ to ‘active’?”
The model returns a ranked list of candidate genes.
The Agent forwards these candidates to a structural biology model (like AlphaFold 3) to assess their druggability.

This is not science fiction; the individual components of this workflow exist today. The frontier is in reliable, autonomous orchestration.

6. Clinical Translation: The Road Ahead

Translating these advances to the clinic remains the ultimate hurdle. While predictive models like MammaPrint (a 70-gene panel) have been used clinically for years to predict breast cancer recurrence, integrating massive, AI-driven transcriptomic models is a different challenge.

Recent developments in 2025 point toward AI-driven virtual cell models in preclinical research, using AI to infer transcriptomic states from non-invasive data (like histopathology images—so-called “virtual spatial transcriptomics”). This bypasses the need for expensive sequencing in the clinic, using the AI to predict the molecular state directly from a standard H&E slide. However, regulatory bodies require interpretability and rigorous multi-cohort validation before such tools can impact patient care.

Conclusion

Transcriptomics is undergoing a fundamental shift from descriptive statistics to predictive modeling, driven by foundation models that learn the implicit rules of gene regulation. While challenges remain in architectural fit, rigorous evaluation, and hallucination, tools like scGPT and spatial graph models provide the necessary computational engines for the Agentic Omics revolution. By granting AI agents the ability to not just read, but computationally manipulate the transcriptome, we are moving closer to a future of fully simulated, closed-loop biological discovery.

Glossary

scRNA-seq (Single-cell RNA sequencing): A technology that measures the expression levels of genes in individual cells, rather than averaging across a bulk tissue sample.
Transcriptome: The complete set of RNA transcripts produced by the genome under specific circumstances or in a specific cell.
Foundation Model: A large-scale neural network pre-trained on vast amounts of unlabeled data, which can be adapted (fine-tuned) to a wide range of downstream tasks.
Spatial Transcriptomics: Technologies that measure gene expression while preserving the spatial organization of the cells within the tissue context.
Perturbation: An experimental (or in-silico) alteration to a biological system, such as knocking out a gene or applying a drug, to observe the system’s response.
In-silico: An experiment or analysis performed on a computer or via computer simulation.

References

Cui, H., Wang, C., Maan, H., et al. (2024). scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nature Methods, 21, 1470–1480.
Theodoris, C. V., Xiao, L., Peng, A., et al. (2023). Transfer learning enables predictions in network biology. Nature, 618(7965), 616–624.
Wu, Y., et al. (2025). Single-cell foundation models: bringing artificial intelligence into cell biology. Experimental & Molecular Medicine.
Chen, J., et al. (2025). AI-Driven Transcriptome Prediction in Human Pathology: From Molecular Insights to Clinical Applications. Biology (MDPI), 14(6), 651.
Zhao, L., et al. (2025). AI-driven virtual cell models in preclinical research: technical pathways, validation mechanisms, and clinical translation potential. npj Digital Medicine.

Introduction: The Language of the Cell#

1. Single-Cell Foundation Models: The Paradigm Shift#

The Pioneers: scGPT and Geneformer#

Beyond the Pioneers: Scale and Specificity#

2. Key Capabilities and Applications#

Automated Cell Type Annotation#

Perturbation Prediction#

Cross-Modality Translation (scTranslator)#

3. The Spatial Frontier: Context is Everything#

4. An Honest Assessment: Where Models Struggle#

The “Language” Analogy Breaks Down#

Hallucinating Biology#

The Evaluation Crisis#

5. Agentic Workflows in Transcriptomics#

6. Clinical Translation: The Road Ahead#

Conclusion#

Glossary#

References#