SWE-Style Coding Agent

Give a model a shell and a test suite, and watch it debug like an engineer.

Key Insight

This project wires an LLM to a shell and a file editor so it can read a codebase, make edits, and run tests in a loop, then points it at a few easy issues from a bug benchmark like SWE-bench.

Why This Matters

Fixing a real bug end-to-end is the canonical test of an agent: it must explore, act, check its own work, and recover from errors — the same loop behind coding assistants like Claude Code.

Key Insight​

Why This Matters​

Key Insight

Why This Matters