Hi all, In a computer architecture class, we happened across a strange compilation choice by GCC that neither I nor my professor can make much sense of. The source is as follows: void foo(int *a, const int *__restrict b, const int *__restrict c) { for (int i = 0; i < 16; i++) { a[i] = b[i] + c[i]; } } I won't reproduce the full compiled output here, as it's rather long, but when compiled with -O3 -mno-avx -mno-sse, GCC 12.2 for x86-64 (via Compiler Explorer: https://godbolt.org/z/o9e4o7cj4) produces an unrolled loop that appears to write each sum into an array on the stack before copying it into the provided pointer a. This seems hugely inefficient - it's doing quite a few memory accesses - and I can't see why it would be necessary. Am I missing some reason why this is more efficient than the naive approach (computing the each sum into an intermediate register, then writing it directly into a)? Thanks, Gaelan