Train a 100M-Parameter LM

Ten times bigger, real web text, and a number to beat — the first run that feels like the real thing.

Key Insight

Scaling to 100 million parameters and training on a slice of real web text (OpenWebText) is the first pretraining run that behaves like a production one. The concrete goal — push validation loss below 3.5 — turns "is it working?" into a measurable target.

Why This Matters

Real data, a real GPU, and a fixed time budget force the skill every practitioner needs: reading a loss curve to judge whether a run is healthy, stalled, or diverging — long before it finishes.

Key Insight​

Why This Matters​

Key Insight

Why This Matters