Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models

Vincenzo Yuto Civale, Roberto Semeraro, Andrew David Bagdanov, Alberto Magi

Recommendation Score

breakthrough🔴 AdvancedReasoning & Agents AI AgentsBenchmarkUseful for both

Research context

Primary field

Reasoning & Agents

Reasoning, planning, tool use, and agentic workflows.

Topics

AI Agents

Paper type

Benchmark

Best for

Useful for both

arXiv categories

cs.AIcs.AI

Why It Matters

Optimal representations in single-cell models are not in final layers but task-dependent intermediate ones—revolutionizing how to extract features for biological AI, directly improving prediction accuracy in research systems.

Abstract

Current single-cell foundation model benchmarks universally extract final layer embeddings, assuming these represent optimal feature spaces. We systematically evaluate layer-wise representations from scFoundation (100M parameters) and Tahoe-X1 (1.3B parameters) across trajectory inference and perturbation response prediction. Our analysis reveals that optimal layers are task-dependent (trajectory peaks at 60% depth, 31% above final layers) and context-dependent (perturbation optima shift 0-96% across T cell activation states). Notably, first-layer embeddings outperform all deeper layers in quiescent cells, challenging assumptions about hierarchical feature abstraction. These findings demonstrate that "where" to extract features matters as much as "what" the model learns, necessitating systematic layer evaluation tailored to biological task and cellular context rather than defaulting to final-layer embeddings.

More in Reasoning & Agents → More on AI Agents →

View on arXiv → Download PDF →

Published April 16, 2026