Heavy vs. Light Convolution

Visualizing why we reduce channels before applying $3 \times 3$ filters.

Heavy Processing

Blue Stack ($C=6$)

A $3 \times 3$ filter applied here must penetrate **6 channels**. This is computationally expensive.

Efficient Processing

Red Stack ($C=3$)

The same $3 \times 3$ filter applied here only penetrates **3 channels**. The work is halved.

Final Output

Green Result

The output volume after spatial features are extracted from the "thin" Red volume.

The "Bottleneck" Advantage

By comparing the Blue and Red stacks, you can see the impact of dimensionality reduction. Both stacks are being scanned by a **$3 \times 3$ spatial filter**.

However, the filter on the Red stack is "shallower." In real-world networks like GoogLeNet, we might reduce 192 channels down to 16, making the spatial convolution **12x faster**.

The Multiplier Effect

Blue Ops: $3 \times 3 \times 6 = \mathbf{54}$ per filter
Red Ops: $3 \times 3 \times 3 = \mathbf{27}$ per filter
Conclusion: $1 \times 1$ filters "prepare" the data to make spatial processing efficient.