N-gram Lookup
When the answer is already sitting in the prompt, just copy it forward.
Key Insight
This project implements prompt-lookup decoding: instead of a draft model, it scans the prompt for a matching n-gram — a short run of recent tokens — and reuses whatever text followed it last time as the draft guess. It needs no extra model and no training.
Why This Matters
When the model is mostly copying from the prompt — summarizing, editing code, or answering from a retrieved document (RAG) — the next tokens often already appear earlier in the text, so this free trick gives large speedups exactly where training a separate draft model would be overkill.