Hi, I'm trying to generate LTO optimized code (with -O2 -flto). The code is very simple: int *SP; int popInt(void) {return *--SP;} void pushInt(int v) {*SP++ = v;} void add (void) { pushInt(popInt()+popInt()); } When the 3 global functions popInt, pushInt and add are in the same compilation unit, the lto optimization works as expected and generates something like -*(SP -2) += *(SP-1); SP--; But when add is in a different compilation unit, gcc cannot succeed in doing such an optimization at link time. Worse, it does not inline the popInt function (but it inlines pushInt, so lto are performed). I tried using the gcse options which are not enabled with O2, O3, adding -finline-functions and tuning inline limits without success. Any hint how I could get the same behavior with separated compilation unit ? This is a gcc 4.6 on a private target. Thanks, Aurélien