Back to catalog

EE508 · Performance Modeling · Distributed Training

Pipeline Parallelism Bubble Visualizer

Split transformer layers across devices, then watch microbatches move through the pipeline. Compare GPipe flush scheduling with 1F1B scheduling and see exactly where idle bubbles waste accelerator time.

0%
Utilization
0
Bubble Slots
0
Steps

Animated Stage Timeline

Forward Backward Bubble

Current Time Step

Same Settings, Two Schedules

Use this comparison to separate the effect of microbatch count from the effect of the schedule itself.

What The Animation Shows

Schedule Model

This model uses one equal-duration forward or backward unit per stage. Real systems also include activation communication and uneven layer times, but the bubble behavior is the same scheduling issue: devices are idle while waiting for dependencies.

Why Students Should Care

Layer Splitting Is Not Enough

Pipeline parallelism reduces memory per device, but idle bubbles can erase much of the throughput gain.

Microbatches Fill The Pipe

More microbatches amortize warmup and drain time, which raises utilization for the same number of stages.

Scheduling Changes Memory

1F1B starts backward work sooner than GPipe, which usually lowers activation residency while keeping the pipeline busy.