In this 1D toy setup, tiled Input-Stationary preserves almost the same access order as standard,
so reuse distances stay similar. Tiling still helps in larger real workloads by improving
working-set fit, cache/TLB locality, and parallel scheduling.
Accumulation Phase
o[0] += i[0] * f[0]
Operational Logic (Standard):
In this setup, the input register holds i[w] constant while the inner loop sweeps
f[s] and updates destination outputs o[q], where q = w - s.