Animated Stage Timeline
Current Time Step
Same Settings, Two Schedules
Use this comparison to separate the effect of microbatch count from the effect of the schedule itself.
What The Animation Shows
Schedule Model
This model uses one equal-duration forward or backward unit per stage. Real systems also include activation communication and uneven layer times, but the bubble behavior is the same scheduling issue: devices are idle while waiting for dependencies.
Why Students Should Care
Layer Splitting Is Not Enough
Pipeline parallelism reduces memory per device, but idle bubbles can erase much of the throughput gain.
Microbatches Fill The Pipe
More microbatches amortize warmup and drain time, which raises utilization for the same number of stages.
Scheduling Changes Memory
1F1B starts backward work sooner than GPipe, which usually lowers activation residency while keeping the pipeline busy.