Performance Profiler: Measuring Throughput & Utilization
Toggle threads to see how "Wasteful Cycles" increase with divergence.
Throughput: Useful work done per hardware cycle. Divergence drops this because "waiting" threads do zero work.
Utilization: Ratio of active execution lanes vs total available lanes in the warp.