Matrix Tiling: Miss Formula Derivation

Visualizing how $n^3$ operations become $\frac{n^3}{2T}$ cache misses.

$n$ $n$ $T$

Key Variable Definitions

  • $n$: The total dimension of the matrix rows and columns.
  • $T$: The dimension of our software "tile" (the block we process at once).
  • $L$: The hardware cache line size (how many elements arrive per miss).

1. The Setup

We divide an $n \times n$ matrix into tiles of size $T \times T$. The hardware loads data in lines of size $L$.

Example Constants: $n=16, T=4, L=4$

2. Loading One Tile

A tile has $T^2$ elements. Since we load $L$ elements per miss:
Misses per tile = $\frac{T^2}{L}$
For a pair (A and B), misses = $\frac{2T^2}{L}$.

3. One Result Tile ($C_{tile}$)

To compute one $T \times T$ output, we need to multiply a row of tiles by a column of tiles. There are $n/T$ pairs.
$\frac{n}{T} \times \frac{2T^2}{L} = \frac{2nT}{L}$

4. Total Matrix Misses

The matrix $C$ contains $(n/T)^2$ such tiles.
$\frac{2nT}{L} \times \frac{n^2}{T^2} = \frac{2n^3}{LT}$

5. Final Simplification

Plugging in $L=4$ (standard for 8-byte doubles in 32B cache lines):
Total Misses = $\frac{n^3}{2T}$

Legend