Scenario: A VAE learns to compress MNIST digits into a 2D latent space and generate new digits from it.
Encoder
Reparameterization
ELBO Loss
0 = early training (blurry), 1 = converged (sharper)
1. Input image x
digit = ?
2. Encoder q(z|x)
Encoder maps the image into a Gaussian in 2D latent space.
3. Reparameterization
Lets gradients flow through z by separating randomness into epsilon.
4. Latent space (2D)
5. Decoder p(x|z)
x_hat from z
6. ELBO loss breakdown
Reconstruction
0.000
KL divergence
0.000
Total loss L
0.000