Step Name
Compute Unit
Memory / DMA
High-Radix Switch
Management
Host CPU
gBMC
PCIe Gen6 x16
Legacy I/O
TensorCore 0
TCS Dispatcher
XLU L
VPU Unit
XLU R
MXU (Matrix)
TensorCore 1
TCS Dispatcher
XLU L
VPU Unit
XLU R
MXU (Matrix)
Memory & DMA Bisection Interconnect Highway (Terabit/s)
Chip Manager
Sparse Core 0
Sparse Core 1
HBM Ctr
HBM Ctr
HBM Ctr
HBM Ctr
HBM4
Stack 0
Stack 0
HBM4
Stack 1
Stack 1
HBM4
Stack 2
Stack 2
HBM4
Stack 3
Stack 3
Inter-Chip Interconnect
ICR Router
ICI Link Stack
6x 224G SerDes Octals
(Scale-out I/O)
(Scale-out I/O)
VLIW TensorCore
Decoupled Matrix-Vector Pipeline
Architecture Peak
10.5 PFLOPS
TCS brain
Decodes VLIW instructions and dispatches parallel streams to the units below.
DMA Orchestrator
XLU Left
XLU Right
MXU MATRIX
256x256 Systolic Array
VPU
Pointwise ops (ReLU, GeLU). Transfers data between MXU results and HBM.
Vmem
Software-Managed Scratchpad