Skip to main content

Double Backward


Taking the gradient of a gradient.


Key Insight

PyTorch's autograd engine is capable of computing higher-order derivatives. By setting create_graph=True during the first backward pass, PyTorch tracks the gradient computation itself in a new dynamic computation graph, allowing you to compute a double backward (the gradient of the gradient).

Why This Matters

Higher-order derivatives are required for advanced techniques like gradient penalty in generative adversarial networks (GANs), meta-learning, and optimizing learning rates. Understanding double backward unlocks these cutting-edge optimization methods.