Weight Quantization Visualizer

Step through how FP32 weights are compressed to low-bit integers — and see exactly where precision is lost.

Bit-width

Scheme

Granularity

Weight Tensor (4 × 4)

FP32 original

Original 32-bit float weights.

Appears after Step 3 (dequantize). Bar height = |w − w̃|.

🎯

Symmetric maps [−max, max] with zero-point = 0. Asymmetric adds an offset, useful for ReLU activations that are always ≥ 0.

📊

Per-tensor uses one scale for all weights. Per-row assigns one scale per output channel, capturing different magnitudes at a small cost.

⚠️

One large outlier forces a wide scale, wasting most of the INT8 range. SmoothQuant and GPTQ redistribute outliers before quantizing.