Assumes \(3T^2 < C\). Total misses reduce as tile size \(T\) increases.
Control Panel
Current Loop Indices
i:0
j:0
k:0
il:0
jl:0
kl:0
Active Float
Hardware Block (16B)
Software Tile (\(T=2\))
A
×
B
=
C
Physical Cache StatsStep: 1 / 64
Current Step Misses
3
A, B, C
Total Physical Misses
3
Analyzing hardware state...
void mmm(float a[n][n], float b[n][n], float c[n][n], int n) { //
THREE OUTER LOOPS ITERATE OVER TILES
for (i = 0; i < n; i +=T) for (j=0; j < n; j +=T) for (k=0; k < n; k +=T)
// THREE INNER LOOPS ITERATE WITHIN TILES
for (il = i; il < i + T; il++) for (jl=j; jl < j + T; jl++) for (kl=k; kl < k + T; kl++) c[il][jl] += a[il][kl] * b[kl][jl];
}