← Back to fields

Field

Machine Learning

Core modeling, optimization, inference, and systems efficiency.

27 papers · latest 2026-04-23

Common topics in this field

An T. Le, Vien Ngo

significant🟡 IntermediateMachine LearningModel Compression
cs.AIcs.LGcs.RO

Presents AAC, a differentiable landmark compressor for ALT heuristics that guarantees admissibility by design, enabling reliable pathfinding without calibration or convergence requirements.

Weijie Zhao, Mingquan Liu, Bolun Wang et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.AIcs.LG

Nexusformer replaces linear attention projections with nonlinear expansions, enabling stable, inheritable Transformer scaling without retraining—revolutionizing model evolution for large-scale deployment.

SLAM Labs, :, Oleksiy Ostapenko et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.LG

Super Apriel enables dynamic, real-time switching between four attention mechanisms in a single checkpoint, drastically reducing deployment costs and latency for LLMs—practitioners can now serve multiple speed/accuracy presets without multiple models.

Jinyu Guo, Zhihan Zhang, Yutong Li et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.CLcs.CL

DASH-KV slashes long-context inference costs via asymmetric KV hashing, preserving quality while cutting compute—critical for deploying LLMs in latency-sensitive production systems.

Chaitanya Dwivedi, Binxuan Huang, Himanshu Gupta et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.AIcs.LG

Reduces MoE training costs by upcycling existing experts, enabling scalable, compute-efficient LLMs without new training—transformative for deploying large models on constrained infrastructure.

Zixuan Liu, Zhiyong Chen, Nan Xue et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.ITcs.AIcs.IT

WISV adapts speculative decoding verification to wireless conditions using semantic, not token-level, checks—dramatically improving edge-LLM latency and throughput in real-world mobile deployments.

Yujie Chen, Tailai Chen, Yifeng Gao et al.

breakthrough🔴 AdvancedMachine LearningModel Compression
cs.AIcs.AI

Introduces delta attention halting that detects semantic fixing points to skip redundant token processing, enabling hardware-compatible efficiency gains in long-context LLMs without sacrificing accuracy—critical for deploying scalable inference.

Libo Sun, Peixiong He, Po-Wei Harn et al.

cs.LGcs.CLcs.LG

MoE-nD tailors KV cache compression per layer, boosting accuracy over uniform methods. Practitioners should care because it enables longer context inference with minimal memory overhead without retraining.

Xiao Wang, Zezhong Zhang, Isaac Lyngaas et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.AIcs.LG

A linear-complexity global attention mechanism enables exascale generative data assimilation, dramatically improving uncertainty quantification in weather/climate models—critical for real-time extreme event prediction systems.

David Berghaus

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.AIcs.LG

EVIL replaces neural networks with evolved interpretable Python code for zero-shot time series inference, enabling deployable, transparent models without retraining—critical for real-time systems needing explainability and low resource use.

Hyeongmeen Baik, Hamed Poursiami, Maryam Parsa et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.NEcs.LGcs.NE

First spiking neural network for sub-mW power converter health monitoring that decouples physics enforcement from temporal processing, enabling real-time edge inference without GPUs—critical for industrial IoT systems needing ultra-low-power reliability.

Yifan Zhao, Yuchen Yang, Matei Budiu et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.PLcs.LGcs.PL

Nautilus automates GPU kernel optimization from high-level tensor algebra, eliminating manual tuning—enabling faster, portable ML system development without expert-level code.

Yukuan Zhang, Mengxin Zheng, Qian Lou

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.CRcs.AIcs.CR

SecureRouter enables efficient encrypted inference by dynamically adapting model structure per query, slashing MPC overhead—making privacy-preserving AI feasible for real-time, high-throughput production systems.

Aditi De

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.AIcs.LG

This paper enables diffusion model inference without digital computation by leveraging thermodynamic equilibration, potentially slashing energy use 10,000x—revolutionizing edge AI deployment and sustainable inference infrastructure.

Mohammed Ezzaldin Babiker Abdullah, Rufaidah Abdallah Ibrahim Mohammed

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.AIcs.LG

Outperforms complex Transformers in solar forecasting using physics-guided CNN-BiLSTM, proving domain knowledge can beat architectural scale—critical for efficient, deployable grid stability systems.

Hongtao Xu, Jianchao Tan, Yuxuan Hu et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.AIcs.LG

SparseBalance co-optimizes sequence length and sparsity heterogeneity in long-context training, dramatically improving efficiency and accuracy—essential for scalable LLM training on real-world data without costly over-provisioning.

S. Aaron McClendon, Jorge Gallego-Feliciano, Stavros Zervoudakis et al.

cs.AIcs.AI

By reusing one small model as summarizer, agent, and isolated code reviewer, this inference-time scaffold roughly doubles AppWorld performance on a single 24GB GPU.

Vasilis Kontonis, Yuchen Zeng, Shivam Garg et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.AIcs.LGcs.AI

MEMENTO trains reasoning models to summarize their own working state into reusable memory blocks, cutting KV-cache costs about 2.5x and boosting throughput without giving up math, science, or coding accuracy.

Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev et al.

significant🟡 IntermediateMachine LearningEfficient Inference
cs.LGcs.AIcs.CL

This study shows popular KV-cache offloading schemes break on context-intensive workloads like structured extraction, then offers a simpler strategy that preserves far more accuracy for long-context production inference.

Jiayuan Ye, Vitaly Feldman, Kunal Talwar

significant🟡 IntermediateMachine LearningModel Compression
cs.CLcs.CL

Pruning and rebalancing pretraining data can improve factual memorization enough for a 110M model to match a 1.3B baseline on entity facts, highlighting data mix as a real scaling lever.

Roberto Vercellino, Jared Willard, Gustavo Campos et al.

significant🟡 IntermediateMachine LearningEfficient Inference
cs.DCcs.LG

Provides public H100 power traces for training, fine-tuning, and vLLM inference, then links them to whole-facility planning—useful for sizing clusters, power delivery, and microgrid strategies.

Sam Gunn

significant🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.LG

Introduces a data-deletion scheme that approximates how a model would behave if specific training data were removed, an important building block for unlearning, auditing, and data attribution.

David Picard, Nicolas Dufour, Lucas Degeorge et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.CVcs.AIcs.CV

PoM replaces attention with a linear-time polynomial mixer, maintaining universal approximation while slashing compute—game-changing for scaling vision and language models on edge devices.

Guhao Feng, Shengjie Luo, Kai Hua et al.

breakthrough🔴 AdvancedMachine LearningEfficient Inference
cs.LGcs.AIcs.CL

In-Place Test-Time Training enables LLMs to adapt weights during inference, overcoming static deployment limits—vital for real-time systems needing continuous learning from streaming data without retraining.

Yulin Zou, Yan Chen, Wenyan Chen et al.

breakthrough🟡 IntermediateMachine LearningEfficient Inference
cs.DCcs.CVcs.LG

CoStream jointly optimizes video codec and multimodal inference to cut computational costs by 40%+—enabling scalable, real-time video analytics without sacrificing accuracy on vision-language models.

Sayed Pedram Haeri Boroujeni, Niloufar Mehrabi, Patrick Woods et al.

cs.CVcs.CV

This paper cuts memory use for on-device LLMs by dynamically quantizing the KV cache—no more fixed precision waste. For anyone deploying LLMs on phones or edge devices, this could mean 2x longer context or 50% smaller models without accuracy loss.

Mateusz Papierz, Asel Sagingalieva, Alix Benoit et al.

significant🔴 AdvancedMachine LearningEfficient Inference
cs.CEcs.LG

HQ-LP-FNO cuts the size and cost of AI models that simulate laser processing by using quantum-inspired mixing, making real-time simulation feasible on standard hardware. This lets manufacturers rapidly test laser parameters without waiting hours for physics simulations.

© 2026 A2A.pub — AI to Action. From papers to practice, daily.
Summaries are AI-assistedPrivacyTerms