The Matrix Math of Inference

Use the controls to step through the math. Red means the GPU is waiting to load weights from VRAM. Blue means the GPU is doing math. Arithmetic Intensity is the ratio of Blue to Red.

Prefill: Processing Prompt Tokens

Notice how we load the weights ONCE, but perform math across MANY tokens.

Model Weights
×
Input Tokens
=
Predicted
Click "Next" to begin stepping through the calculations.

Parameters

8

Live Hardware Counters

Memory Loads (Weights)
0
Math Ops (Compute)
0
Arithmetic Intensity
0.0
Ops per Memory Load
MEMORY BOUND COMPUTE BOUND
Step through the animation to see hardware utilization.