1. Build a tokenization example
This demo uses a small lowercase toy vocabulary. It is not a full production tokenizer, but it shows the core BPE idea used in modern subword tokenization: repeatedly merge the highest-priority adjacent pair.
2. Step through merges
Step 0 shows the focus word split into characters plus an end-of-word marker.
3. Compare tokenization strategies
The exact numbers here come from the toy vocabulary used in this page, but the tradeoff is the same in real tokenizers: BPE usually lands between pure characters and whole words.
Maximum flexibility, but the longest sequence length.
Shorter sequences, but brittle on unseen words and spelling variants.
Reusable subwords reduce sequence length while still backing off to smaller pieces.