Constrained JSON Generation
Force the model to stay inside the schema's lines.
Key Insight
This project uses Outlines or sglang to apply constrained generation at decode time — masking out any next-token choices that would break a target JSON schema — and measures the small overhead this masking adds compared to free, unconstrained generation.
Why This Matters
The downstream tools and data pipelines that consume the model's reply — the JSON parser, the function-call router, the analytics job that loads the response into a database — cannot handle "almost-JSON," output that looks JSON-like but has a missing brace, a stray comma, or an unquoted key. Constraining the sampling step to obey a schema guarantees structurally valid output on every call, which is the missing piece that makes function calling and tool-using agents reliable in production.