Nate Denmark skrev:
I have a question about the optimiser in GCC. Take this bit of code:
for(x = 0; x != 10; x++)
puts("Hello");
When compiled with full optimisations (-O3, -fexpensive-optimizations
etc.) it generates the following loop in assembly:
.L2:
incl %ebx
movl $.LC0, (%esp)
call puts
cmpl $10, %ebx
jne .L2
.LC0 points to the "Hello" string. I'm wondering why GCC puts that
'movl' line inside the loop, so that it's executed each time, when it
could go before the loop? As I understand it, 'puts' shouldn't do
anything to the stack, and the return value is passed back in eax, so
I'm not sure why it's doing the 'movl' each time. If I force loop
unrolling the same thing happens -- the 'movl' each iteration.
From man:puts I see that it is declared "int puts(const char *)". This
means that puts does not promise to leave its argument unchanged.
Therefore the caller must push the argument anew before each call. If it
had been declared "int puts(const char * const)" instead, the push
should be moved outside the loop. Unfortunately this does not seem to
work. I tried with the following program:
void q(const unsigned int);
void f() {for (unsigned int x = 0; x != 10; x++) q(77);}
and built it with "gcc -std=c99 -Os -Wall -Wextra -Werror -S":
.L2:
subl $12, %esp
incl %ebx
pushl $77
call q
addl $16, %esp
cmpl $10, %ebx
jne .L2
As you can see, "pushl $77" is still inside the loop even though q
promises to not change its argument. This must be a bug. I tried to
reproduce the bug with Ada and although the loop generated from Ada
source code is indeed 2 instruction shorter (5 instructions instead of 7
instructions for C), the bug seems to be there too. I have this program
in f.adb:
with Q;
procedure F is begin for i in 1 .. 10 loop Q(77); end loop; end F;
and q.ads:
procedure Q(N : in Natural);
and built with "gnatgcc -Os -Wall -Wextra -Wextra -Werror -S f.adb":
.L5:
pushl $77
.LCFI3:
call _ada_q
popl %eax
decl %ebx
jns .L5
So if you need this loop to be tight, write it in Ada. But something,
probably GCC, still needs to be fixed to get the "pushl $77" out of the
loop (for any language). Unless someone explains that I have misses
something, I will report this as a bug to gcc.
Note 1: Tested with gcc 4.1.1 (Gentoo 4.1.1-r3) and gnatgcc 3.4.5 (from
Gentoo dev-lang/gnat-3.45).
Note 2: If you change the loop in C to "for (unsigned int x = 10; x;
--x)" you will get it down to 6 instructions. Still not as tight as the
Ada version, but one instruction less than your version. That gcc does
not optimize your version to the 6 instruction version seems to be
another bug. (I will report that too unless someone objects.)