Regression Playground
Pick a ground-truth shape, then add noise and outliers. The plot below shows the data the four models will see — outliers are highlighted so the contamination stays visible.
$y_{\text{true}}(x) = 0.15x^3 - 0.8x$, with $x \in [-4, 4]$.
Manual outliers stay until you resample or change the preset. Validation points are drawn from the same ground truth without contamination.
Model Zoo Comparison
Four model families on one dataset. Toggle visibility, compare fits, and tune the per-model knobs to feel each model's inductive bias.
| Model | Flexibility | Prediction style | Train MSE | Validation MSE | Outlier sensitivity |
|---|
Validation MSE is computed on a clean held-out grid drawn from the same ground-truth function, so it rewards models that capture the underlying signal rather than the noise or the outliers.
Outlier Sensitivity Lab
Now make outlier robustness the focus. Push the magnitude and count up — linear regression gets pulled globally, high-degree polynomials contort, trees absorb the shock locally with extra splits, and SVR's ε-tube and $C$ knob let it trade fit for robustness.
Small $C$ → wider tolerance → smoother, outlier-resistant fit.
Large $C$ → tight fit, the SVR chases difficult points.
Curve shift averages $|f_{\text{with}}(x) - f_{\text{without}}(x)|$ over a dense grid in $[-4, 4]$. Larger values mean outliers reshaped the prediction more.
Same data, same kernel, different penalty. With small $C$, residuals outside the ε-tube cost little, so the fit stays smooth and ignores outliers. With large $C$, those same residuals cost a lot and the fit bends to accommodate them — exactly the trade-off described on slide 18 of the lecture deck.
Takeaway. Outliers reveal each model's inductive bias. Linear models can be pulled globally, high-degree polynomials can contort, trees react locally through extra splits, and SVR can trade off robustness and fit through $C$ and $\varepsilon$.
How The Models Work
Each card pairs a one-glance illustration with the structural assumption the model is making. Read these alongside the comparison plot above to see why the curves take the shapes they do.
Linear regression
Fits one global straight line $\hat{y} = w x + b$ by minimizing squared residuals. Simple, interpretable, and the squared loss penalizes large residuals heavily — so a single far-off outlier can drag the line substantially.
Polynomial regression
Adds polynomial features $x, x^2, \dots, x^d$ and fits a linear model in that lifted space. Higher degree means more flexibility, but also more capacity to wiggle around noise and outliers — classic bias–variance tension.
Decision tree regression
Recursively partitions the input space and predicts the regional mean. The result is piecewise constant — flat steps with hard jumps at split boundaries. Distant outliers usually distort just one or two leaves rather than the global shape.
Support-vector regression
Allows free errors inside an ε-tube around the fit and penalizes only points outside it. Those outside points become support vectors. The penalty $C$ controls how hard the model insists on shrinking those violations: smaller $C$ → smoother and more robust, larger $C$ → tighter fit.
Implementation notes. Linear and polynomial fits use ordinary least squares solved via normal equations with a small Tikhonov regularizer for numerical stability at high degree (and inputs are rescaled to $[-1, 1]$ before lifting). The decision tree is a faithful 1D CART — greedy squared-error splits down to the depth limit, with leaves predicting the regional mean. SVR is solved with dual coordinate descent (libsvm-style) on the ε-insensitive loss: each per-coordinate update is the closed-form soft-threshold-then-clip $\beta_i \in [-C, C]$, with linear, RBF, and polynomial kernels.
Slide anchors
ML_Lecture3Sp26_ML_Review.pdf· page 13 — linear & polynomial regression- pages 14–15 — decision-tree regression (and random forest)
- pages 16–17 — support-vector regression and the ε-tube
- page 18 — outlier resistance and the role of $C$
- pages 20–29, 68 — gradient boosting and random forest aggregation (extension)