← Back to archive day

Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels

Yifan Zhao, Yuchen Yang, Matei Budiu, Sasa Misailovic

38

Recommendation Score

breakthrough🔴 AdvancedMachine LearningEfficient InferenceSystemUseful for both

Research context

Primary field

Machine Learning

Core modeling, optimization, inference, and systems efficiency.

Topics

Efficient Inference

Paper type

System

Best for

Useful for both

arXiv categories

cs.PLcs.LGcs.PL

Why It Matters

Nautilus automates GPU kernel optimization from high-level tensor algebra, eliminating manual tuning—enabling faster, portable ML system development without expert-level code.

Abstract

We present Nautilus, a novel tensor compiler that moves toward fully automated math-to-kernel optimization. Nautilus compiles a high-level algebraic specification of tensor operators into efficient tiled GPU kernels. Nautilus's successive lowering design allows high-level optimizations, expression rewrites, and tile optimizations to be jointly applied in a single end-to-end system. Nautilus presents a novel auto-scheduler that discovers sequences of high-level optimizations, while preserving the regular program structure needed by tile optimizers. Nautilus's auto-scheduler captures complex interactions and trade-offs in the high-level optimizations, including aggressive global transformations like advanced reduction fusion. Nautilus is the first end-to-end tensor compiler capable of starting from a math-like description of attention and automatically discovering FlashAttention-3-like kernels, offloading the entire burden of optimization from the programmer to the compiler. Across five transformer-based models and 150 evaluation configurations on NVIDIA GH200 and RTX 5090 GPUs, Nautilus achieves up to 23% higher throughput than state-of-the-art compilers on GH200 and up to 42% on RTX 5090, while matching or exceeding manually written cuDNN kernels on many long-sequence configurations.

Published April 16, 2026
© 2026 A2A.pub — AI to Action. From papers to practice, daily.
Summaries are AI-assistedPrivacyTerms