Skip to main content

CoT vs. Direct on GSM8K


Ask the model to show its work, and it gets the answer right far more often.


Key Insight

Chain-of-thought (CoT) prompting asks the model to write out its reasoning step by step before giving a final answer, instead of replying right away. This project measures the accuracy gain from that single change on GSM8K grade-school math problems.

Why This Matters

The intermediate reasoning tokens give the model more room to compute and a place to lay out partial results. The same model — with no retraining at all — solves much harder problems just by being told to think out loud first.