← Back to topics

Topic

Model Compression

Quantization, pruning, distillation, and smaller deployment footprints.

2 papers · latest 2026-04-10

Most active fields for this topic

Jiayuan Ye, Vitaly Feldman, Kunal Talwar

significant🟡 IntermediateMachine LearningModel Compression
cs.CLcs.CL

Pruning and rebalancing pretraining data can improve factual memorization enough for a 110M model to match a 1.3B baseline on entity facts, highlighting data mix as a real scaling lever.

Sayed Pedram Haeri Boroujeni, Niloufar Mehrabi, Patrick Woods et al.

cs.CVcs.CV

This paper cuts memory use for on-device LLMs by dynamically quantizing the KV cache—no more fixed precision waste. For anyone deploying LLMs on phones or edge devices, this could mean 2x longer context or 50% smaller models without accuracy loss.

© 2026 A2A.pub — AI to Action. From papers to practice, daily.
Summaries are AI-assistedPrivacyTerms