Long-Context Extension

A model trained on short text can often be stretched to long text by rescaling how it counts position.

Key Insight

A model trained at a 4k context window can be extended to longer inputs by rescaling its RoPE angles — via position interpolation or YaRN — usually with little or no retraining.

Why This Matters

Pretraining at long context is expensive, so most long-context models are extended after the fact. Testing the result with a needle-in-a-haystack probe shows whether the model truly uses the new length or just tolerates it.

Key Insight​

Why This Matters​

Key Insight

Why This Matters