Speculative Agent Steps

Speculative decoding guesses the next token; this project guesses the next action — and rolls back when it guesses wrong.

Key Insight

This project takes the guess-and-verify idea behind speculative decoding and lifts it up a level: inside an agent loop, it speculatively runs the most likely next tool call before the model has finished deciding, then verifies the choice and rolls back the work if the guess was wrong. The agent's idle time — waiting on the model to think — becomes useful work done ahead of time.

Why This Matters

Agent loops spend a lot of time stalled: waiting for the model to choose a tool, then waiting for that tool to return. Anywhere you can guess and verify, you can hide that latency, exactly as speculative decoding hides decode latency. This project generalizes the most reliable "free" trick in inference from single tokens to whole agent steps — a frontier the field is actively exploring.

Key Insight​

Why This Matters​

Key Insight

Why This Matters