Pruning as LLM Scale-Down

Explore how removing low-importance weights shrinks model memory and compute. Compare unstructured, structured, and N:M pruning while tracking parameter count, memory footprint, and quality trade-offs.

Pruning Controls

Base Model Size

Precision

Pruning Strategy

Unstructured (random sparse positions) Structured (remove entire channels) N:M (keep top N in each M group)

Sparsity 50%

Percent of parameters removed.

Key Formula

remaining = total * (1 - sparsity)

memory = remaining * bytes_per_param

Real systems also include index/metadata overhead, especially for unstructured sparse tensors.

Remaining Parameters

3.50B

Pruned: 3.50B

Estimated Model Memory

7.00 GB

Saved: 7.00 GB

Quality Proxy

96.2 / 100

Mild quality impact

Sparse Weight Matrix

Gray = removed

Layer-Wise Sparsity Profile

Auto-updated

Sparse Weight Matrix

Layer-Wise Sparsity Profile

Interpretation