Variational Autoencoder on MNIST: Step-by-Step Visualizer

Scenario: A VAE learns to compress MNIST digits into a 2D latent space and generate new digits from it.

Scenario

We feed handwritten digits (0, 1, 3, 7, 8) into a VAE. The encoder summarizes each image as a Gaussian N(mu, sigma^2) in a 2D latent space. The decoder reconstructs the digit from a sampled latent code, and we can also generate new digits by sampling from the prior N(0, I).

Pipeline Formulas

Encoder

q(z|x) = N(z; mu(x), sigma^2(x))

Reparameterization

z = mu + sigma * epsilon,   epsilon ~ N(0, I)

ELBO Loss

L = Recon(x, x_hat) + KL(q(z|x) || N(0, I))

Controls

0.50

0 = early training (blurry), 1 = converged (sharper)

1. Input image x

digit = ?

2. Encoder q(z|x)

mu = (0.000, 0.000)
log sigma^2 = (0.000, 0.000)
sigma = (1.000, 1.000)

Encoder maps the image into a Gaussian in 2D latent space.

3. Reparameterization

epsilon = (0.000, 0.000)
z = mu + sigma * epsilon
z = (0.000, 0.000)

Lets gradients flow through z by separating randomness into epsilon.

4. Latent space (2D)

encoder mu sampled z digit prototypes 1-sigma ellipse

5. Decoder p(x|z)

x_hat from z

6. ELBO loss breakdown

Reconstruction

0.000

KL divergence

0.000

Total loss L

0.000

Step explanation

Pick a digit and click Next Step to walk through the VAE pipeline.