← Back to fields

Field

Machine Learning

Core modeling, optimization, inference, and systems efficiency.

11 papers · latest 2026-04-14

Common topics in this field

S. Aaron McClendon, Jorge Gallego-Feliciano, Stavros Zervoudakis et al.

cs.AIcs.AI

By reusing one small model as summarizer, agent, and isolated code reviewer, this inference-time scaffold roughly doubles AppWorld performance on a single 24GB GPU.

Vasilis Kontonis, Yuchen Zeng, Shivam Garg et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.AIcs.LGcs.AI

MEMENTO trains reasoning models to summarize their own working state into reusable memory blocks, cutting KV-cache costs about 2.5x and boosting throughput without giving up math, science, or coding accuracy.

Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev et al.

significant🟡 IntermediateMachine LearningEfficient Inference
cs.LGcs.AIcs.CL

This study shows popular KV-cache offloading schemes break on context-intensive workloads like structured extraction, then offers a simpler strategy that preserves far more accuracy for long-context production inference.

Jiayuan Ye, Vitaly Feldman, Kunal Talwar

significant🟡 IntermediateMachine LearningModel Compression
cs.CLcs.CL

Pruning and rebalancing pretraining data can improve factual memorization enough for a 110M model to match a 1.3B baseline on entity facts, highlighting data mix as a real scaling lever.

Roberto Vercellino, Jared Willard, Gustavo Campos et al.

significant🟡 IntermediateMachine LearningEfficient Inference
cs.DCcs.LG

Provides public H100 power traces for training, fine-tuning, and vLLM inference, then links them to whole-facility planning—useful for sizing clusters, power delivery, and microgrid strategies.

Sam Gunn

significant🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.LG

Introduces a data-deletion scheme that approximates how a model would behave if specific training data were removed, an important building block for unlearning, auditing, and data attribution.

David Picard, Nicolas Dufour, Lucas Degeorge et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.CVcs.AIcs.CV

PoM replaces attention with a linear-time polynomial mixer, maintaining universal approximation while slashing compute—game-changing for scaling vision and language models on edge devices.

Guhao Feng, Shengjie Luo, Kai Hua et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.AIcs.CL

In-Place Test-Time Training enables LLMs to adapt weights during inference, overcoming static deployment limits—vital for real-time systems needing continuous learning from streaming data without retraining.

Yulin Zou, Yan Chen, Wenyan Chen et al.

breakthrough🟡 IntermediateMachine LearningEfficient Inference
cs.DCcs.CVcs.LG

CoStream jointly optimizes video codec and multimodal inference to cut computational costs by 40%+—enabling scalable, real-time video analytics without sacrificing accuracy on vision-language models.

Sayed Pedram Haeri Boroujeni, Niloufar Mehrabi, Patrick Woods et al.

cs.CVcs.CV

This paper cuts memory use for on-device LLMs by dynamically quantizing the KV cache—no more fixed precision waste. For anyone deploying LLMs on phones or edge devices, this could mean 2x longer context or 50% smaller models without accuracy loss.

Mateusz Papierz, Asel Sagingalieva, Alix Benoit et al.

significant🔴 AdvancedMachine LearningEfficient Inference
cs.CEcs.LG

HQ-LP-FNO cuts the size and cost of AI models that simulate laser processing by using quantum-inspired mixing, making real-time simulation feasible on standard hardware. This lets manufacturers rapidly test laser parameters without waiting hours for physics simulations.

© 2026 A2A.pub — AI to Action. From papers to practice, daily.
Summaries are AI-assistedPrivacyTerms