FlashDecoding Ablation

Turn the decode-tuned attention kernel on and off, and measure exactly what it buys.

Key Insight

This project runs the same model in vLLM with and without the FlashDecoding kernel — an ablation — and measures the change in decode throughput.

Why This Matters

FlashDecoding reorganizes how the KV cache is read so the GPU keeps its HBM bandwidth saturated during decode. Measuring the on/off gap shows how much a single well-matched kernel is really worth.

Key Insight​

Why This Matters​

Key Insight

Why This Matters