Visualizing how matrix partitioning minimizes expensive off-chip memory access.
AI measures Operations per Byte of memory traffic. This determines if a program is "Memory Bound" or "Compute Bound."
Everything is fetched exactly once.
Assuming Float32 (4 bytes per element):
$$AI = \frac{\text{Reuse Factor}}{4}$$Measures operations per **element** fetched. It is the unit-less version of Arithmetic Intensity.