Reducing reuse distance to exploit limited local storage capacity.
0
1
2
3
4
5
6
*
0
1
2
3
=
0
1
2
3
(a) Example 1D convolution
Temporal tiling: Reducing the reuse distance to make it
smaller than the storage capacity of a certain memory level. In this example, L1 fits 1 weight and 2 partial sums
simultaneously.
Tiled Operation Timeline
Input (I)
Ops
Filter (W)
P-Sum (p)
Step: 0 / 16
Reuse Dist: -
Memory Hierarchy
Global Buffer (Capacity 4)
0
1
2
3
Intermediate L1 (Capacity: 1W, 2p)
Weight Slot
P-Sum Tile
Processing Element
× +
Ready
Start stepping to see how the tile of two partial sums stays in L1
while we reuse the current weight.