SFT a 1B Base Model

The first pass that turns a brilliant autocomplete into something that answers you.

Key Insight

This project takes an open base model and runs supervised fine-tuning (SFT) on instruction-response examples, then scores the result on MT-Bench. SFT does not teach new facts — it teaches the model the chat format and the habit of replying to a request instead of just continuing the text.

Why This Matters

SFT is the first and cheapest step that turns a raw next-token predictor into something that follows instructions. Almost every assistant you have used started with an SFT pass like this one.

Key Insight​

Why This Matters​

Key Insight

Why This Matters