ResNet: Deep Residual Learning

Visualizing the addition of residual functions $F(x)$ and identity $x$.

Standard Deep Network

A linear chain where signal strength fades at every weight.

× 0.1 × 0.1 × 0.1 × 0.1 0.0001 0.01 1.0
Gradient Strength

Ready to propagate...

Residual Network (ResNet)

Signals split and merge at Addition Nodes.

+ 1.0 (Identity) + 1.0 (Identity) F(x) x F(x') x' 1.02 1.01 1.0
Gradient Strength

Ready to propagate...

Summation Logic

At each Residual Node, we calculate $H(x) = F(x) + x$.

  • $F(x)$ (Residual): The output from the convolutional layers.
  • $x$ (Identity): The raw input passed through the shortcut.
  • The node sums these values, allowing the network to learn only the necessary delta (residual) instead of the whole mapping.

Gradient Highways

When backpropagating, the gradient can choose the identity path. This acts as a 1:1 conduit that prevents the signal from being crushed by successive multiplications.