Re: A puzzle: different optimization for compound-expressions

Marc Glisse <marc.glisse@xxxxxxxx> · Sat, 31 Oct 2015 21:21:06 +0100 (CET)

On Sat, 31 Oct 2015, Bruno Loff wrote:

I am always impressed by the power of the GCC optimizer. Today I found
a somewhat surprising abnormality when using compound-expressions.
Look at the two definitions for the function f(a) = a*a + a:

int64_t f1( int64_t a ) {
   return a * a + a;
}

int64_t f2( int64_t a ) {
   return ({
       int64_t b;
       b = a * a;
       ({
           int64_t c;
           c = b + a;
           c;
       });
   });
}

I expected that GCC would either make a mess with the second
definition, or would smartly produce the same code for both
definitions. I was wrong. Here is the (simplified) x86-64 output of
with -O3:

f1:
      leaq    1(%rdi), %rax
      imulq   %rdi, %rax
      ret

f2:
      movq    %rdi, %rax
      imulq   %rdi, %rax
      addq    %rdi, %rax
      ret

The code for f2 is what I expected, but if I was a little smarter (and
knew more asm) I might have instead expected f1. The code for f1
basically does

b := a + 1
b := b * a

Whereas the code for f2 does:

b := a
b := b * a
b := b

The code for f1 is clearly better, saving on one instruction. They
are, of course, completely equivalent.

It isn't that obvious to me which version is better, but I agree that both 
should generate the same code.

So why is GCC failing to optimize the compound expressions all the
way? My guess would be that it has to do with the order in which some
optimization passes are happening. Anyone?

A number of optimizations happen, for historical reasons, during parsing, 
when the front-end calls functions from fold-const.c on expressions. We 
are currently moving many such optimizations to a later stage (using 
match.pd), if this transformation is moved, it will also apply to f2.

-fdump-tree-all can give you a lot of information about the various stages 
of optimization.

--
Marc Glisse