FlashDecoding Ablation
Turn the decode-tuned attention kernel on and off, and measure exactly what it buys.
Key Insight
This project runs the same model in vLLM with and without the FlashDecoding kernel — an ablation — and measures the change in decode throughput.
Why This Matters
FlashDecoding reorganizes how the KV cache is read so the GPU keeps its HBM bandwidth saturated during decode. Measuring the on/off gap shows how much a single well-matched kernel is really worth.