Pruning as LLM Scale-Down

Explore how removing low-importance weights shrinks model memory and compute. Compare unstructured, structured, and N:M pruning while tracking parameter count, memory footprint, and quality trade-offs.

Remaining Parameters
3.50B
Pruned: 3.50B
Estimated Model Memory
7.00 GB
Saved: 7.00 GB
Quality Proxy
96.2 / 100
Mild quality impact

Sparse Weight Matrix

Gray = removed

Layer-Wise Sparsity Profile

Auto-updated

Interpretation