On 07/09/2013 11:29 AM, Tom Bachmann wrote: > ...the optimizer has to eliminate many temporaries, inline calls, > track pointers etc. It seems to me that, for no apparent reason, > this goes wrong sometimes. For example, in g++-4.6.4 or g++-4.8.1, > both of the above functions yield essentially equal machine code, > with a stack frame size of about 56 bytes. On the other hand, > g++-4.7.3 produces the attached code [NB: this is compiled without > exception suppert, to simplify comparison to the pure C code]. (I > obtained this via objdump, since I did not find the extra labels etc > produced by g++ -S helpful.) Notice that the stack frame size has > grown to 376 bytes! I have been trying to understand the produced > code, but could not make much sense of it. It's hard to be precise without analysing your code in detail, but: As a general rule, x86-64 is very sensitive to register pressure. It happens often that what appears to be a minor inlining decision tips the register allocator over the edge, and we start to need a lot of spill slots. But it is time for the idea that a programmer can write arbitrarily awful code and just expect the optimizer to sort it all out to die. Sometimes GCC can do amazing things, and sometimes the tiniest tweak will mean that your beautifully optimized routine no longer optimizes so well. No matter what we do, this will always be so if you push a compiler to the edge. Andrew.