Operational Logic (Standard):
In this setup, the output register holds o[q] constant while the inner loop streams i[w] and f[s] with w = q + s. This maximizes output accumulation reuse.