Input (with padding)
Kernel (fixed random)
Output feature map
Concept Reference
Adds P rows/cols of zeros around the input.
P=0 (Valid): no padding — output shrinks.
P=(K-1)/2: same padding — output = input size (when S=1).
Padding preserves spatial dimensions and avoids losing edge information.
Controls how many cells the kernel slides per step.
S=1: dense convolution — overlap every position.
S=2: skip every other position — output size halves ≈.
Larger stride = smaller output + faster computation.
⌊(N + 2P − K) / S⌋ + 1
N=input, P=padding, K=kernel, S=stride.
Result must be a positive integer — invalid combos are rejected.
Every CNN layer changes feature map dimensions. Choosing P and S correctly lets you:
① preserve resolution (encoder) ② downsample efficiently ③ avoid dimension mismatch errors in code.