RoPE from Scratch
Encode position by spinning each token's vectors — and the angle between two tokens becomes their distance.
Key Insight
RoPE encodes a token's position by rotating its query and key vectors by an angle proportional to that position. Because rotations compose, the attention score between two tokens ends up depending only on their relative distance, not their absolute positions.
Why This Matters
RoPE is the default positional scheme in Llama, Mistral, Qwen, and DeepSeek. Implementing it — including the half-rotation trick — and confirming that ⟨q, k⟩ depends only on relative position makes the most widely used position embedding concrete rather than magical.