TPU v7 Interactive Deep Dive

Future ASIC Architecture (Speculative)

Ready
Step Name
Compute Unit
Memory / DMA
High-Radix Switch
Management
Host CPU
gBMC
PCIe Gen6 x16
Legacy I/O
TensorCore 0
TCS Dispatcher
XLU L
VPU Unit
XLU R
MXU (Matrix)
TensorCore 1
TCS Dispatcher
XLU L
VPU Unit
XLU R
MXU (Matrix)
Memory & DMA Bisection Interconnect Highway (Terabit/s)
Chip Manager
Sparse Core 0
Sparse Core 1
HBM Ctr
HBM Ctr
HBM Ctr
HBM Ctr
HBM4
Stack 0
HBM4
Stack 1
HBM4
Stack 2
HBM4
Stack 3
Inter-Chip Interconnect
ICR Router
6x 224G SerDes Octals
(Scale-out I/O)

VLIW TensorCore

Decoupled Matrix-Vector Pipeline

Architecture Peak
10.5 PFLOPS
TCS brain

Decodes VLIW instructions and dispatches parallel streams to the units below.

DMA Orchestrator
XLU Left
XLU Right
MXU MATRIX 256x256 Systolic Array
VPU

Pointwise ops (ReLU, GeLU). Transfers data between MXU results and HBM.

Vmem Software-Managed Scratchpad