Program fails when optimizing for speed under gcc 4.6

Wander Lairson Costa <wander.lairson@xxxxxxxxx> · Tue, 20 Dec 2011 18:13:34 -0200

Dear all,

I have a home made alpha blend code that used to work until gcc 4.5
but fails on gcc 4.6 (tested on gcc 4.6.1 [ubuntu] and gcc 4.6.2
[archlinux]) when I optimize code for speed (-O1). If I optimize for
size (-Os) it works fine. To make a long story short, the problem is
that when optimizing for speed, gcc generates code that accesses local
variables using the esp register, which cause troubles in some part of
my code that is written in assembly:

        __asm__ __volatile__ (
            /* Initialize the counter and skip */
            /* if the latter is equal to zero. */
            "movl   %0,%%ecx\n\t"
            "cmpl   $0,%%ecx\n\t"
            "jz     not_blend\n\t"

            /* Load the frame buffer pointers into the registers. */

            "pushl      %%ebx\n\t"        <------ HERE IS THE ROOT OF
THE PROBLEM
            "movl       %1,%%edi\n\t"   <------ In this three lines
gcc accesses %1, %2, and %3
            "movl       %2,%%esi\n\t"   <------ variables using the esp register
            "movl       %3,%%ebx\n\t"  <------

The problem is that inside the assembly code, I do a "pushl %%ebx"
instruction, which updates the esp register, and following it, I
access local variables using the "%n" idiom, but gcc (when optimizing
for speed) emits code that accesses the variables through esp
register, which is no longer valid. When no optimization is applied or
when optimizing for size, the local vars accesses are done through ebp
register, and everything runs fine.

Now I am in doubt if I am loosing some spec detail in IA32 that
prohibit me from pushing things to the stack or if gcc is emitting
some kind of invalid code. Any ideas?

Thanks in advance.

-- 
Best Regards,
Wander Lairson Costa