← Back to archive day

SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention

Hongtao Xu, Jianchao Tan, Yuxuan Hu, Pengju Lu, Hongyu Wang, Pingwei Sun, Yerui Sun, Yuchen Xie, Xunliang Cai, Mingzhen Li, Weile Jia

37

Recommendation Score

breakthrough🔴 AdvancedMachine LearningEfficient InferenceBenchmarkUseful for both

Research context

Primary field

Machine Learning

Core modeling, optimization, inference, and systems efficiency.

Topics

Efficient Inference

Paper type

Benchmark

Best for

Useful for both

arXiv categories

cs.LGcs.AIcs.LG

Why It Matters

SparseBalance co-optimizes sequence length and sparsity heterogeneity in long-context training, dramatically improving efficiency and accuracy—essential for scalable LLM training on real-world data without costly over-provisioning.

Abstract

While sparse attention mitigates the computational bottleneck of long-context LLM training, its distributed training process exhibits extreme heterogeneity in both \textit{1)} sequence length and \textit{2)} sparsity sensitivity, leading to a severe imbalance problem and sub-optimal model accuracy. Existing algorithms and training frameworks typically focus on single issue, failing to systematically co-optimize these two problems. Therefore, we propose SparseBalance, a novel algorithm-system co-design framework, which exploits the sparsity and sequence heterogeneity to optimize model accuracy and system efficiency jointly. First, we propose workload-aware dynamic sparsity tuning, which employs a bidirectional sparsity adjustment to eliminate stragglers and exploit inherent bubbles for free accuracy. Second, we propose a sparsity-aware batching strategy to achieve coarse-grained balance, which complements dynamic sparsity tuning. Experimental results demonstrate that SparseBalance achieves up to a 1.33$\times$ end-to-end speedup while still improving the long-context capability by 0.46\% on the LongBench benchmark.

Published April 15, 2026
© 2026 A2A.pub — AI to Action. From papers to practice, daily.
Summaries are AI-assistedPrivacyTerms