Double Backward
Taking the gradient of a gradient.
Key Insight
PyTorch's autograd engine is capable of computing higher-order derivatives. By setting create_graph=True during the first backward pass, PyTorch tracks the gradient computation itself in a new dynamic computation graph, allowing you to compute a double backward (the gradient of the gradient).
Why This Matters
Higher-order derivatives are required for advanced techniques like gradient penalty in generative adversarial networks (GANs), meta-learning, and optimizing learning rates. Understanding double backward unlocks these cutting-edge optimization methods.