Re: Optimisation puzzle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nate Denmark skrev:
I have a question about the optimiser in GCC. Take this bit of code:

       for(x = 0; x != 10; x++)
               puts("Hello");

When compiled with full optimisations (-O3, -fexpensive-optimizations
etc.) it generates the following loop in assembly:

.L2:
       incl    %ebx
       movl    $.LC0, (%esp)
       call    puts
       cmpl    $10, %ebx
       jne     .L2

.LC0 points to the "Hello" string. I'm wondering why GCC puts that
'movl' line inside the loop, so that it's executed each time, when it
could go before the loop? As I understand it, 'puts' shouldn't do
anything to the stack, and the return value is passed back in eax, so
I'm not sure why it's doing the 'movl' each time. If I force loop
unrolling the same thing happens -- the 'movl' each iteration.
From man:puts I see that it is declared "int puts(const char *)". This means that puts does not promise to leave its argument unchanged. Therefore the caller must push the argument anew before each call. If it had been declared "int puts(const char * const)" instead, the push should be moved outside the loop. Unfortunately this does not seem to work. I tried with the following program:
void q(const unsigned int);
void f() {for (unsigned int x = 0; x != 10; x++) q(77);}

and built it with "gcc -std=c99 -Os -Wall -Wextra -Werror -S":
.L2:
       subl    $12, %esp
       incl    %ebx
       pushl   $77
       call    q
       addl    $16, %esp
       cmpl    $10, %ebx
       jne     .L2

As you can see, "pushl $77" is still inside the loop even though q promises to not change its argument. This must be a bug. I tried to reproduce the bug with Ada and although the loop generated from Ada source code is indeed 2 instruction shorter (5 instructions instead of 7 instructions for C), the bug seems to be there too. I have this program in f.adb:
with Q;
procedure F is begin for i in 1 .. 10 loop Q(77); end loop; end F;

and q.ads:
procedure Q(N : in Natural);

and built with "gnatgcc -Os -Wall -Wextra -Wextra -Werror -S f.adb":
.L5:
       pushl   $77
.LCFI3:
       call    _ada_q
       popl    %eax
       decl    %ebx
       jns     .L5

So if you need this loop to be tight, write it in Ada. But something, probably GCC, still needs to be fixed to get the "pushl $77" out of the loop (for any language). Unless someone explains that I have misses something, I will report this as a bug to gcc.

Note 1: Tested with gcc 4.1.1 (Gentoo 4.1.1-r3) and gnatgcc 3.4.5 (from Gentoo dev-lang/gnat-3.45).

Note 2: If you change the loop in C to "for (unsigned int x = 10; x; --x)" you will get it down to 6 instructions. Still not as tight as the Ada version, but one instruction less than your version. That gcc does not optimize your version to the 6 instruction version seems to be another bug. (I will report that too unless someone objects.)

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux