DDPM on MNIST

Key Insight

A DDPM (Denoising Diffusion Probabilistic Model) learns to generate images by mastering one almost trivial skill: look at a noisy image and predict the noise that was added to it. Training has two halves — a forward process that takes a clean MNIST digit and stirs in a controlled amount of random (Gaussian) noise (you can jump straight to any noise level in a single step, so generating training pairs is essentially free), and a learned reverse process in which a small U-Net — an encoder-decoder with skip connections that let it keep fine pixel detail while still reasoning about the whole image — guesses that noise so it can be subtracted away. The loss is just mean squared error between the predicted and the true noise, which is why diffusion training is so stable compared with a GAN's two-player tug-of-war. This project builds the whole pipeline at toy scale and samples by starting from pure static and running the reverse step T times (often 1000), watching a recognizable digit slowly emerge from the noise.

Key Insight​

Key Insight