Section 1
Generalization Explorer
As model complexity rises, training error often falls monotonically, but validation error may eventually rise. That turning point is the practical signature of overfitting.
Generalization gap = Val MSE − Train MSE.
Section 2
Bias-Variance Tradeoff Explorer
Bias comes from a model family being too rigid. Variance comes from the fitted model changing too much when the dataset changes. A single fit cannot teach this — variance only becomes visible across repeated resampled datasets.
Each faint curve is a polynomial fit on a fresh resample of training data.
If the average across resamples sits away from the truth, bias is high.
Empirical estimates over the prediction grid. Total proxy = bias² + variance + noise floor.
This section uses repeated synthetic resampling to build empirical intuition for bias and variance — it is not a closed-form theorem result.
Section 3
L1 vs L2 Regularization Lab
Regularization adds a penalty that discourages overly complex solutions. L1 tends to prefer sparse solutions; L2 tends to shrink all weights more smoothly.
L1 reaches sparsity at smaller λ than L2 — that asymmetry is part of the lesson.
L1 drives many coefficients toward zero. L2 shrinks them smoothly.
Stacked: data loss (blue) + penalty (purple/green). Watch how moving λ trades them off.
Geometry mini-card: why L1 is sparse and L2 is round
L2: circular constraint region. The MSE contour usually meets it on a smooth interior point.
L1: diamond constraint region with sharp axes — solutions often land on a corner where one weight is zero.
Section 4
Dropout Intuition Explorer
Dropout reduces overfitting by forcing a neural network to succeed under many random subnetworks during training instead of relying on a single brittle path.
Training mode: a fresh random mask drops some hidden units on every forward pass.
Each row is one forward pass during training. Filled cells show active hidden units in that pass — dropout samples a fresh subnetwork every time.
References
Slide anchors and further reading
Slides/ML_Lecture3Sp26_ML_Review.pdfpage 2 — definition and causes of overfitting.Slides/ML_Lecture3Sp26_ML_Review.pdfpages 3-5 — bias and variance definitions.Slides/ML_Lecture3Sp26_ML_Review.pdfpage 6 — preventing overfitting (data, simpler models, regularization, dropout).Slides/ML_Lecture3Sp26_ML_Review.pdfpage 7 — regularization definition, L1 / L2 effect summary.Slides/ML_Lecture3Sp26_ML_Review.pdfpage 9 — L1 vs L2 comparison.Slides/ML_Lecture3Sp26_ML_Review.pdfpages 10-11 — L2 and L1 geometric intuition.Slides/ML_LectureSp26_CNN.pdfpages 69-76 — variance stability and batch normalization.