Long-Context Extension
A model trained on short text can often be stretched to long text by rescaling how it counts position.
Key Insight
A model trained at a 4k context window can be extended to longer inputs by rescaling its RoPE angles — via position interpolation or YaRN — usually with little or no retraining.
Why This Matters
Pretraining at long context is expensive, so most long-context models are extended after the fact. Testing the result with a needle-in-a-haystack probe shows whether the model truly uses the new length or just tolerates it.