EE508 · Fundamentals · Regression

Regression Model Zoo & Outlier Sensitivity

Different regression models make different assumptions about structure, smoothness, and tolerance to error. Compare linear, polynomial, decision-tree, and support-vector regression on the same data — then stress-test them with noise and outliers and watch each model's inductive bias surface.

Linear · OLS Polynomial · degree d CART tree · 1D SVR · ε-tube + C
Section 1

Regression Playground

Pick a ground-truth shape, then add noise and outliers. The plot below shows the data the four models will see — outliers are highlighted so the contamination stays visible.

Ground-truth preset

$y_{\text{true}}(x) = 0.15x^3 - 0.8x$, with $x \in [-4, 4]$.

Data shape
Samples30
Noise σ0.40
Outliers
Count3
Magnitude4.0
Position
training samples outliers $y_{\text{true}}(x)$

Manual outliers stay until you resample or change the preset. Validation points are drawn from the same ground truth without contamination.

Section 2

Model Zoo Comparison

Four model families on one dataset. Toggle visibility, compare fits, and tune the per-model knobs to feel each model's inductive bias.

Focus
Linear Polynomial Tree SVR ε-tube SVR support vector
Polynomial
Degree d3
Decision tree
Max depth4
Support-vector regression
ε (tube width)0.30
C (penalty)5.0
Kernel
Comparison table
Model Flexibility Prediction style Train MSE Validation MSE Outlier sensitivity

Validation MSE is computed on a clean held-out grid drawn from the same ground-truth function, so it rewards models that capture the underlying signal rather than the noise or the outliers.

Section 3

Outlier Sensitivity Lab

Now make outlier robustness the focus. Push the magnitude and count up — linear regression gets pulled globally, high-degree polynomials contort, trees absorb the shock locally with extra splits, and SVR's ε-tube and $C$ knob let it trade fit for robustness.

Outlier knobs (mirrored)
Count3
Magnitude4.0
Position
SVR · C in focus
C5.0

Small $C$ → wider tolerance → smoother, outlier-resistant fit.
Large $C$ → tight fit, the SVR chases difficult points.

Without outliers
With outliers
Influence summary

Curve shift averages $|f_{\text{with}}(x) - f_{\text{without}}(x)|$ over a dense grid in $[-4, 4]$. Larger values mean outliers reshaped the prediction more.

SVR · C demonstration
$C = 0.5$ — relaxed
$C = 5$ — balanced
$C = 80$ — strict

Same data, same kernel, different penalty. With small $C$, residuals outside the ε-tube cost little, so the fit stays smooth and ignores outliers. With large $C$, those same residuals cost a lot and the fit bends to accommodate them — exactly the trade-off described on slide 18 of the lecture deck.

Takeaway. Outliers reveal each model's inductive bias. Linear models can be pulled globally, high-degree polynomials can contort, trees react locally through extra splits, and SVR can trade off robustness and fit through $C$ and $\varepsilon$.

Section 4

How The Models Work

Each card pairs a one-glance illustration with the structural assumption the model is making. Read these alongside the comparison plot above to see why the curves take the shapes they do.

Linear regression

Fits one global straight line $\hat{y} = w x + b$ by minimizing squared residuals. Simple, interpretable, and the squared loss penalizes large residuals heavily — so a single far-off outlier can drag the line substantially.

Polynomial regression

Adds polynomial features $x, x^2, \dots, x^d$ and fits a linear model in that lifted space. Higher degree means more flexibility, but also more capacity to wiggle around noise and outliers — classic bias–variance tension.

Decision tree regression

Recursively partitions the input space and predicts the regional mean. The result is piecewise constant — flat steps with hard jumps at split boundaries. Distant outliers usually distort just one or two leaves rather than the global shape.

Support-vector regression

Allows free errors inside an ε-tube around the fit and penalizes only points outside it. Those outside points become support vectors. The penalty $C$ controls how hard the model insists on shrinking those violations: smaller $C$ → smoother and more robust, larger $C$ → tighter fit.

Implementation notes. Linear and polynomial fits use ordinary least squares solved via normal equations with a small Tikhonov regularizer for numerical stability at high degree (and inputs are rescaled to $[-1, 1]$ before lifting). The decision tree is a faithful 1D CART — greedy squared-error splits down to the depth limit, with leaves predicting the regional mean. SVR is solved with dual coordinate descent (libsvm-style) on the ε-insensitive loss: each per-coordinate update is the closed-form soft-threshold-then-clip $\beta_i \in [-C, C]$, with linear, RBF, and polynomial kernels.

Sources

Slide anchors