Understand the math behind tiling and how it compares to Standard Attention.
The dotted line shows the Data Flow. In Step 2C, we normalize the matrix A(1) by dividing each row by its sum l(1).