Padding Waste Audit

Every padding token is compute the GPU spends on a word that isn't even there.

Key Insight

This project instruments a static-batching server to measure what fraction of its decode FLOPs are spent on padding — the filler tokens added so every sequence in a batch reaches the same length.

Why This Matters

Padding is pure waste: the GPU does real work on tokens that carry no information. Putting a number on that waste shows exactly why continuous batching, which needs no padding, beats static batching on busy, mixed-length traffic.

Key Insight​

Why This Matters​

Key Insight

Why This Matters