I tested with gcc version 4.9.2 (Debian 4.9.2-10) targeting x86_64. The snippet you posted compiles into a single 'ret' instruction under most optimization levels since sum has no visible effect. When I change the return type of sum to int and return s, most optimization levels (even just -O) seem to do the hoisting you desire. The loop became .L3: addl (%rdx), %eax addq $4, %rdx cmpq %rcx, %rdx jne .L3 rep ret which I read roughly as do { eax += *rdx; rdx += 4; } while(rcx != rdx) I do not have an aarch64 compiler handy, but I do have an arm compiler, so I checked it. Again under -Os, .L3: ldr r2, [r3], #4 add r0, r0, r2 cmp r3, r1 bne .L3 bx lr which I read the same as the x86_64 code above. Jeff