Skip to main content

β-VAE Study

Key Insight

A β-VAE adds a single knob, β, that multiplies the KL divergence term of the VAE's loss. This term acts as a strict regulator, pushing the model to keep its internal "filing cabinet" (the latent space) tidy and close to a standard bell curve.

This project sweeps β from 0 up to 10 to observe the trade-offs across three distinct states. To make this easier to understand, imagine the model consists of a Secretary (Encoder), a Filing Cabinet (Latent Space), and a Report Writer (Decoder).

  • 1. Low β (Sharp but Messy): The model reconstructs the input sharply, but leaves a messy latent space.
    • Analogy: The Secretary throws documents into the cabinet without any system. The Report Writer can see all the unique details to create a highly accurate reconstruction, but the cabinet itself is structurally unorganized.
  • 2. High β (Structured but Blurry): The model organizes the latent space beautifully, but the output becomes blurry.
    • Analogy: The Secretary perfectly standardizes every document to fit into exact folders. However, the unique details of the data are trimmed away to fit this strict format, forcing the Report Writer to produce a generalized, blurry output.
  • 3. Too High β (Posterior Collapse): Pushing β too far causes the Decoder to do the job alone, ignoring the latent space entirely. The latent code stops carrying any information about the input.
    • Analogy: The Secretary is so obsessed with perfect uniformity that every document is turned into an identical blank sheet. The Report Writer realizes the cabinet is useless, ignores it completely, and just guesses the output blindly.

Seeing these failure directions in one sweep teaches a valuable lesson: generative training is always a balancing act between mathematical structure and information retention. There is never a single "best" setting.