Skip to main content

Triton Softmax


Write a GPU kernel in Python — and watch it keep up with PyTorch's own.


Key Insight

Triton lets you write a GPU kernel in Python-like code instead of raw CUDA. Implementing softmax — which reads a row, finds its max, exponentiates, and normalizes — and comparing it to F.softmax shows how close a hand-written kernel can get to PyTorch's built-in one.

Why This Matters

Softmax is everywhere — every attention layer uses it — and writing it yourself in Triton is the gentlest on-ramp to GPU programming: no C++, no CUDA toolchain, just Python.