Tensor Processing Unit (TPU)

Interactive architecture & systolic visualization

Ready
14 GiB/s
14 GiB/s
30 GiB/s
30 GiB/s
30 GiB/s
10 GiB/s
167 GiB/s
167 GiB/s
PCIe Gen3 x16 Interface
Host Interface
DDR3 DRAM Chips
DDR3 Interfaces
Weight FIFO
(Weight Fetcher)
Unified
Buffer
(Local
Activation
Storage)
Systolic
Data
Setup
Matrix Multiply Unit
(64K per cycle)
Accumulators
Activation
Normalize / Pool
Control
Control
Control
Control
Control
Instr

Systolic Array

Input Data
Weights are preloaded
Done
Partial Sums
Data
Control
.
.
+
+
+
..
+
Software has the illusion that each 256B input is read at once, and they instantly update one location of each of 256 accumulator RAMs.