Visualizing Logits vs. Softmax vs. Hard Max
Temperature ($T$) acts as a smoothing factor. By dividing logits by $T$ before exponentiation, we scale the relative differences between scores.
As $T \to \infty$, $z_i/T \to 0$, so $e^{z_i/T} \to 1$. Every class then gets a probability of $1/K$.
Broad = Logits. Mid = Softmax. Thin = Hard Max.