Reducing Memory Cost: Tiling & Reuse

Visualizing how matrix partitioning minimizes expensive off-chip memory access.

Efficiency Logic: Deep Dive

1. Arithmetic Intensity (AI)

AI measures Operations per Byte of memory traffic. This determines if a program is "Memory Bound" or "Compute Bound."

Large Memory DRAM Cost
$$\text{Total Reads} = (M \times CHW) + (CHW \times N)$$

Everything is fetched exactly once.

$$AI = \frac{\text{Total MACs}}{\text{Total Bytes Transferred}}$$

Assuming Float32 (4 bytes per element):

$$AI = \frac{\text{Reuse Factor}}{4}$$

2. MAC Reuse Factor

Measures operations per **element** fetched. It is the unit-less version of Arithmetic Intensity.

$$\text{Total MACs} = M \times N \times CHW$$ $$\text{Reuse} = \frac{\text{Total MACs}}{\text{Total Reads (Elements)}}$$

Simulation Control

Step 0 of 0

Live Analysis

Idle
Memory Reads
0
MAC Ops
0
Reuse Factor
0.0x
Ops/Elem
Intensity
0.00
Ops/Byte
Active Tile
-
[M, N]
Select a scenario to begin.
Filters (\(M\))
\(CHW\)
\(M\)
×
Input fmaps
\(N\)
\(CHW\)
=
Output fmaps
\(N\)
\(M\)
Processing Element (PE)
Local Memory (SRAM)
[ Waiting for instructions... ]