Chunking Ablation

The size of the pieces decides what the model gets to read.

Key Insight

Before documents can be retrieved, they must be split into smaller passages — a step called chunking. This project is an ablation that varies the chunk size (200 vs. 800 vs. 1,600 tokens) and overlap, then measures how each choice changes a RAG system's answer quality.

Why This Matters

Chunk too small and a passage loses the context that makes it meaningful; chunk too large and retrieval drags in irrelevant text. Chunking is one of the quietest but highest-leverage knobs in a RAG pipeline.

Key Insight​

Why This Matters​

Key Insight

Why This Matters