The Transformer encodes position using pairs of Sine and Cosine functions at different frequencies. This allows the encoding at pos + k to be found simply by rotating the encoding at pos.
Higher dimensions have lower frequencies (slower rotation).
The full vector consists of 64 pairs of coordinates bundled together. The highlighted box shows the single pair (2 dimensions) currently plotted on the right.
Notice how changing the Offset (k) only changes the angle of rotation. The model can learn a weight matrix to apply this exact rotation to detect relative distances!