On 07/09/2013 02:33 PM, Tom Bachmann wrote: > On 09.07.2013 13:37, Andrew Haley wrote: >> On 07/09/2013 11:29 AM, Tom Bachmann wrote: >> >>> ...the optimizer has to eliminate many temporaries, inline calls, >>> track pointers etc. It seems to me that, for no apparent reason, >>> this goes wrong sometimes. For example, in g++-4.6.4 or g++-4.8.1, >>> both of the above functions yield essentially equal machine code, >>> with a stack frame size of about 56 bytes. On the other hand, >>> g++-4.7.3 produces the attached code [NB: this is compiled without >>> exception suppert, to simplify comparison to the pure C code]. (I >>> obtained this via objdump, since I did not find the extra labels etc >>> produced by g++ -S helpful.) Notice that the stack frame size has >>> grown to 376 bytes! I have been trying to understand the produced >>> code, but could not make much sense of it. >> >> It's hard to be precise without analysing your code in detail, but: >> >> As a general rule, x86-64 is very sensitive to register pressure. It >> happens often that what appears to be a minor inlining decision tips >> the register allocator over the edge, and we start to need a lot of >> spill slots. >> > > I do not think this is what is happening here. Most of the stackframe is > never even initialized. Instead the frame holds a local temporary object > which is not properly "decomposed" into aggregates for some reason. Right, so GCC probably used all those slots at some point, but later on discovered that they could be eliminated. By then, however, all of the offsets had been calculated. Andrew.