Roofline Plot for Your Engine
Every operating point is either starved for memory bandwidth or starved for compute — the plot tells you which.
Key Insight
This project sweeps batch size and prompt length through your own inference engine and draws a roofline plot — throughput against arithmetic intensity — so you can see which operating points are memory-bound and which are compute-bound.
Why This Matters
Whether to spend money on more HBM bandwidth or more compute depends entirely on which side of the roofline your real workload sits. Measuring it yourself replaces guesswork with a picture you can point at.