Hover over the Scores matrix to see the exact dot-product calculation. The layout now follows the standard matrix multiplication equation: $Q \times K^T = \text{Scores}$.
Hover over the Scores matrix to see the mathematical breakdown.
Future tokens (where $j > i$) are set to $-\infty$ so they zero out during Softmax.
Converts scores into probability weights that sum to $1.0$ across each row.