Cross-Region Latency

The closest datacenter wins the first token; physics does the rest.

Key Insight

This project deploys the same model in two geographic regions and measures the time to first token difference when each request is routed to the nearest region versus a far one.

Why This Matters

Network round-trips across continents add tens to hundreds of milliseconds before any compute even starts, so routing each user to their closest region is often the cheapest latency win available — with no change to the model at all.

Key Insight​

Why This Matters​

Key Insight

Why This Matters