Straight-Through Estimator

Pretend the non-differentiable is differentiable.

Key Insight

Some operations, like rounding or thresholding, are non-differentiable because their derivative is zero almost everywhere. A straight-through estimator (STE) solves this by using the non-differentiable operation in the forward pass, but passing the gradients straight through unchanged during the backward pass as if the operation was an identity function.

Why This Matters

STEs are essential for training models with discrete components, such as VQ-VAEs or discrete latent variables. They offer a practical workaround for incorporating hard decision boundaries into continuous autograd pipelines.

Key Insight​

Why This Matters​

Key Insight

Why This Matters