This visualization builds on the course idea that ML training minimizes a loss function iteratively using gradient descent.
Loss surface: f(x, y) = x² + 10y²
In lecture, gradient descent is introduced as an iterative method for minimizing loss. This page shows how changing the update rule affects the path, speed, and stability of convergence.
Click the canvas to set a new start point.
Crank speed to 100× to see the full race.
SGD is the core lecture concept — every other update rule below is a modification that adds a correction term to the same idea.
The lecture's core update rule. Zigzags when the surface is steep.
SGD with memory of past steps. Damps oscillations over time.
SGD with a per-axis learning rate. Straightens the path.
Momentum + RMSProp combined. The default in modern training.