Reproduce a Mini-Chinchilla Plot

Train seven small models and watch the scaling law draw its own curve.

Key Insight

Scaling laws say a model's loss falls in a smooth, predictable way as you add parameters, data, and compute. Training seven models from 10M to 500M parameters — each given the right number of tokens for its size — and plotting their iso-FLOP loss curves reproduces the Chinchilla result in miniature: for a fixed compute budget, there is one model size that wins.

Why This Matters

Seeing the curve emerge from your own runs makes scaling laws believable. They are what let a lab predict a giant model's loss from a handful of small ones — the forecast that justifies betting millions of dollars on a single training run.

Key Insight​

Why This Matters​

Key Insight

Why This Matters