Skip to main content

Bottleneck Fix


Sometimes the compiler makes things slower — and the cure is to stop breaking its graph.


Key Insight

torch.compile is fastest on one unbroken graph. When it meets code it cannot trace — a print, a data-dependent branch, an unsupported op — it inserts a graph break, splitting the model and falling back to slow eager mode in between, sometimes making the whole run slower than not compiling at all.

Why This Matters

Finding and removing graph breaks (with TORCH_LOGS="graph_breaks") restores the speedup the compiler promised, and teaches you exactly which Python the compiler can and cannot handle — turning a bottleneck back into a win.