← Back to Catalog

Weight Quantization Visualizer

Step through how FP32 weights are compressed to low-bit integers — and see exactly where precision is lost.

Weight Tensor (4 × 4)

FP32 original

Original 32-bit float weights.

Quantization Math

Statistics

Quantization Error per Weight

Appears after Step 3 (dequantize). Bar height = |w − w̃|.

Memory Footprint

🎯

Symmetric vs Asymmetric

Symmetric maps [−max, max] with zero-point = 0. Asymmetric adds an offset, useful for ReLU activations that are always ≥ 0.

📊

Per-tensor vs Per-row

Per-tensor uses one scale for all weights. Per-row assigns one scale per output channel, capturing different magnitudes at a small cost.

⚠️

Outliers hurt quality

One large outlier forces a wide scale, wasting most of the INT8 range. SmoothQuant and GPTQ redistribute outliers before quantizing.