Advanced Decoding: Beam, Greedy & Sampling

Current Best Sequence:

Awaiting first step...

Score: 0.00

Legend

Generated Token

Cumulative Score

Transition Log Prob

Faded = Pruned / Dropped

Deep Dive: Decoding Math & The Combinatorial Explosion ▼

1 The Combinatorial Explosion

If an LLM vocabulary has 50,000 words, a 5-word sentence has $50,000^5$ combinations. Evaluating them all is mathematically impossible. Beam Search tames this by generating all next possibilities for active paths, but then ruthlessly pruning everything except the top k.

The "Greedy Trap": Greedy Search ($k=1$) falls into traps by picking a word that looks good immediately (like 'a') but leads to terrible continuations. Beam Search survives this by keeping 'runner-up' paths alive.

2 Log Probabilities & Length Penalty

Probabilities are decimals (0 to 1). Multiplying many tiny decimals causes computer chips to hit underflow (round to 0). Instead, LLMs add Log Probabilities ($\log P$). Because probabilities are $< 1$, Log Probs are always negative. Closer to 0 is better.

\[ \mathrm{score}_t = \mathrm{score}_{t-1} + \log p(\mathrm{token}_t \mid \mathrm{prefix}) \]

The Length Problem: Because we keep adding negative numbers, longer sentences will always have lower scores. To fix this, we apply Length Normalization (Score $\div$ Number of tokens) so the model doesn't unfairly favor artificially short sentences.

3 Temperature Sampling (Stochastic Decoding)

Instead of strictly picking the highest score, Sampling rolls a weighted dice. This is how ChatGPT generates creative, varied responses. The Temperature ($T$) scales the original probabilities before rolling the dice:

❄️ Low Temp ($T < 1$) Exaggerates differences. The most likely word becomes almost guaranteed (approaching Greedy behavior).
🔥 High Temp ($T > 1$) Flattens the distribution. Crazy, less likely words have a much higher chance of being picked.