Matrix Tiling: Miss Formula Derivation

Visualizing how $n^3$ operations become $\frac{n^3}{2T}$ cache misses.

1. The Setup

We divide an $n \times n$ matrix into tiles of size $T \times T$. The hardware loads data in lines of size $L$.

Example Constants: $n=16, T=4, L=4$

A tile has $T^2$ elements. Since we load $L$ elements per miss:

Misses per tile = $\frac{T^2}{L}$

For a pair (A and B), misses = $\frac{2T^2}{L}$.

To compute one $T \times T$ output, we need to multiply a row of tiles by a column of tiles. There are $n/T$ pairs.

$\frac{n}{T} \times \frac{2T^2}{L} = \frac{2nT}{L}$

The matrix $C$ contains $(n/T)^2$ such tiles.

$\frac{2nT}{L} \times \frac{n^2}{T^2} = \frac{2n^3}{LT}$

Plugging in $L=4$ (standard for 8-byte doubles in 32B cache lines):

Total Misses = $\frac{n^3}{2T}$