Hallucination Triage

The bug is not what the model says; it's what it says when it should say nothing.

Key Insight

This project builds a 100-prompt evaluation made of questions the model genuinely cannot know — invented names, future events, made-up acronyms — and triages the responses by how often the model responsibly says "I don't know" versus confidently inventing an answer (hallucination).

Why This Matters

A model can ace knowledge benchmarks and still mislead users in production because the training objective rewards fluent continuation, not honest abstention; measuring the confident-wrong rate alongside the refusal rate is the only way to see this failure mode clearly before your users do.

Key Insight​

Why This Matters​

Key Insight

Why This Matters