TPU Core: Vector Unit & Lanes

Explore how Google's Tensor Processing Unit handles vector operations through SIMD parallelism.

Live Status

Ready to begin. Start by loading data from Vector Memory.

The "Lane" Concept

A Vector Unit consists of 128 parallel lanes. In the diagram, we see 8 lanes stacked ($\times 8$). Each lane acts as a sub-processor that handles one element of a large vector simultaneously.

Vector Memory & Reg File

Data is staged in the 32Ki $\times$ 32b Vector Memory. The DMA Interface handles external data transfers. The Register File is the local lane storage for active computation.

SIMD Units: ALU0 & ALU1

Each lane has two specialized arithmetic logic units, ALU0 and ALU1. In SIMD mode, every lane's ALUs perform the exact same operation (like addition or activation functions) on their local data simultaneously.

Matrix Unit Interface

Data flows To and From the Matrix Multiply Units. This bidirectional connection allows the Vector Unit to preprocess data for matrix multiplication and post-process the results (e.g., applying ReLU).