2016-05-04 13:31 GMT+02:00 Oleg Endo <oleg.endo@xxxxxxxxxxx>: > On Wed, 2016-05-04 at 12:57 +0200, Aurelien Buhrig wrote: > >> I'm trying to generate LTO optimized code (with -O2 -flto). >> The code is very simple: >> >> int *SP; >> int popInt(void) {return *--SP;} >> void pushInt(int v) {*SP++ = v;} >> void add (void) { pushInt(popInt()+popInt()); } >> >> When the 3 global functions popInt, pushInt and add are in the same >> compilation unit, the lto optimization works as expected and >> generates >> something like >> -*(SP -2) += *(SP-1); >> SP--; >> >> But when add is in a different compilation unit, gcc cannot succeed >> in >> doing such an optimization at link time. Worse, it does not inline >> the >> popInt function (but it inlines pushInt, so lto are performed). >> >> I tried using the gcse options which are not enabled with O2, O3, >> adding -finline-functions and tuning inline limits without success. >> Any hint how I could get the same behavior with separated compilation >> unit ? >> >> This is a gcc 4.6 on a private target. > > I'd recommend trying a newer version. Many improvements have been made > in the past 4 or 5 years. > > Other than that, make sure that each compilation unit is compiled with > -flto and the linking is done with the gcc driver program (also > specifying -flto) and not by invoking LD directly. > > Cheers, > Oleg Thanks for your reply. lto pass is correctly done. Maybe I would try upgrading the backend to a newer version, but it is a quite important workload... Thanks, Aurélien