Pipeline Parallelism Bubble Visualizer

Controls

Change the model split and microbatch count, then play or step through the schedule.

Pipeline stages 4

Microbatches 8

Animation speed 1.0x

Schedule

Timeline step t0

Utilization 0%

Bubble ratio 0%

Useful work 0

Total slots 0

Animated Stage Timeline

Forward Backward Bubble

Current Time Step

Same Settings, Two Schedules

Use this comparison to separate the effect of microbatch count from the effect of the schedule itself.

What The Animation Shows

Schedule Model

This model uses one equal-duration forward or backward unit per stage. Real systems also include activation communication and uneven layer times, but the bubble behavior is the same scheduling issue: devices are idle while waiting for dependencies.

Why Students Should Care

Layer Splitting Is Not Enough

Pipeline parallelism reduces memory per device, but idle bubbles can erase much of the throughput gain.

Microbatches Fill The Pipe

More microbatches amortize warmup and drain time, which raises utilization for the same number of stages.

Scheduling Changes Memory

1F1B starts backward work sooner than GPipe, which usually lowers activation residency while keeping the pipeline busy.