Analyzing Total System Efficiency & Serialization Penalties
Unified (Compute/Finish)
Path A (Do_this)
Path B (Do_that)
Hardware SIMT Mask (Execution State)
Thread Data Mapping (Processed IDs)
Instruction Stream (PC)● EXECUTING
0Time (Cycles)
0Instant Active
100%Total Cumulative Eff.
0msTotal Latency
In the Naive case, warps execute branching logic serially, wasting clock cycles. In the Optimal case, Aligning data to warp boundaries allows the GPU to process branches in parallel.