Visualizing the addition of residual functions $F(x)$ and identity $x$.
A linear chain where signal strength fades at every weight.
Ready to propagate...
Signals split and merge at Addition Nodes.
Ready to propagate...
At each Residual Node, we calculate $H(x) = F(x) + x$.
When backpropagating, the gradient can choose the identity path. This acts as a 1:1 conduit that prevents the signal from being crushed by successive multiplications.