TPU v1 Architecture

CISC & Instruction Pipeline

CISC (Complex Instruction Set Computer): Unlike typical CPUs where instructions are simple, the TPU operates at a higher abstraction level. A single TPU instruction is extremely powerful and can trigger operations taking thousands of clock cycles.

Because of this complexity, the average CPI (Cycles Per Instruction) is very high (typically 10 to 20, but can be much higher for large matrix operations). This reduces the bottleneck of constantly fetching instructions, allowing the chip to focus purely on math.

Data Flow Note

Bidirectional Arrows: Double connections (e.g., between CPU and PCIe) represent bidirectional data flow. This allows the TPU to both receive input data/instructions and send finished inferences back to the Host.

Instructions

Select an instruction

Click an instruction above to see its details and animate the data flow.
CISC Instruction Format
CPU
PCIe Gen3 x16
Host Interface
Instr Buffer
Unified Buffer
(Local Activation Storage)
Systolic
Data
Setup
DDR3 DRAM
Weight FIFO
(Weight Fetcher)
Matrix Multiply Unit
(64K / cycle)
Accumulators
Activate / Pool
(ReLU, Sigmoid)