Interactive Attention Matrix Multiplication

Hover over the Scores matrix to see the exact dot-product calculation. The layout now follows the standard matrix multiplication equation: $Q \times K^T = \text{Scores}$.

$\times$

$=$

Hover over the Scores matrix to see the mathematical breakdown.

Apply Attention Mask

Future tokens (where $j > i$) are set to $-\infty$ so they zero out during Softmax.

Softmax (Along Rows)

Converts scores into probability weights that sum to $1.0$ across each row.