On Wed, 2016-05-04 at 12:57 +0200, Aurelien Buhrig wrote: > I'm trying to generate LTO optimized code (with -O2 -flto). > The code is very simple: > > int *SP; > int popInt(void) {return *--SP;} > void pushInt(int v) {*SP++ = v;} > void add (void) { pushInt(popInt()+popInt()); } > > When the 3 global functions popInt, pushInt and add are in the same > compilation unit, the lto optimization works as expected and > generates > something like > -*(SP -2) += *(SP-1); > SP--; > > But when add is in a different compilation unit, gcc cannot succeed > in > doing such an optimization at link time. Worse, it does not inline > the > popInt function (but it inlines pushInt, so lto are performed). > > I tried using the gcse options which are not enabled with O2, O3, > adding -finline-functions and tuning inline limits without success. > Any hint how I could get the same behavior with separated compilation > unit ? > > This is a gcc 4.6 on a private target. I'd recommend trying a newer version. Many improvements have been made in the past 4 or 5 years. Other than that, make sure that each compilation unit is compiled with -flto and the linking is done with the gcc driver program (also specifying -flto) and not by invoking LD directly. Cheers, Oleg