Conditional VAE

Key Insight

A plain VAE can dream up new MNIST digits, but it picks which digit at random — you cannot ask it for a "7." Class conditioning fixes this by feeding the digit label into both the encoder and the decoder, so the model learns a separate region of its latent space for each class. At generation time you simply hand it the label you want, draw a random latent, and reliably get that exact digit. This tiny change is the seed of all controllable generation: the same idea, scaled up, is how text-to-image models turn a written prompt into the picture you asked for.

Key Insight​

Key Insight