Mastering Pooling Layers

How neural networks "summarize" images to find the most important features.

The Two Main Types

Max Pooling

Picks the Maximum value. Best for detecting sharp features like edges or high-contrast patterns. It answers: "Is the feature here?"

$\max(x_{1 \dots n})$

Average Pooling

Calculates the Average. Smooths out the input, giving a generalized overview. Often used at the end of networks (Global Average Pooling).

$\frac{1}{n}\sum x_i$

Why Use It?

  • Invariance: Small shifts in the image won't change the pooled output.
  • Efficiency: Fewer pixels means fewer weights to train in later layers.

Interactive Pooler

Switch modes to see how calculations change.

Input Feature Map (4x4)
Pooled Output (2x2)
MAX OPERATION
$\max(12, 5, 8, 20) = 20$
Current Hyperparams
Pool Size: 2x2
Stride: 2

Realistic Feature Downsampling Visualizing a hand-drawn digit '8'

Input ($128$)

Pool ($64$)

Pool ($32$)

Pool ($16$)

Final ($8$)

Even after aggressive downsampling (from 16,384 pixels to 64 pixels), the topology of the digit is preserved. This massive compression allows the network to process larger images efficiently while focusing on key features.

Note: This demonstrates a sequential cascade. Each stage is pooled from the previous stage's output (128 → 64 → 32 → 16 → 8), simulating sequential layers in a Deep CNN.