Hello, When compiling for SPARC using GCC 4.5.1 and the option -fwhole-program, it seems that sometimes the decision to inline is not well taken since the resulting code is bigger (I assume that when inline is applied the objective is to reduce the size of the code and the number of executed instructions). The following example should help clarifying the point: short f1(char a, short b){ int i; short c=0; for (i=0; i<b; i++){ c+=a; } return a+c; } int main(){ volatile short b=1; volatile char a=1, c=1; b=f1(a,b); b=f1(c,b); b=f1(a,b); b=f1(c,b); b=f1(a,b); //NC=5 /* b=f1(c,b); */ //NC=6 return 0; } The reasoning is described below (before hand I beg you apologies if this is not rigorous or precise enough): * function f1 takes 10 assembler instructions when it is not inlined * compiling using the -fwhole-program allows to inline the function f1 ------- Number of calls * it is NOT advantageous to inline if there are more than 2 calls to f1 * f1 is inlined when there are less than 5 calls to f1 * f1 is NOT inlined when there are 6 or more calls to f1 ------- Number of function parameters * function f1 has two input parameters * when f1 is modified to have more input parameters, it does the inline more easily and the code size is even bigger * when f1 is modified to have less input parameters, it does not the inline so easily and code size is acceptable ------- Hypothesis about the inline: NC=Number of Calls=5 or 6 for this test case b=f1(a,b); b=f1(c,b); b=f1(a,b); b=f1(c,b); b=f1(a,b); //NC=5 /* b=f1(c,b); */ //Uncomment for NC=6 CC=Cost of each call=1(call itself) +1 (delay slot)=2 CP=Cost of each parameter (i.e. instructions required for preparing the inputs, a bit re-arranged)=ldub+sll+sra=3 40000018: c6 17 bf fc lduh [ %fp + -4 ], %g3 40000020: da 0f bf fe ldub [ %fp + -2 ], %o5 40000024: 87 28 e0 18 sll %g3, 0x18, %g3 40000028: 9b 2b 60 18 sll %o5, 0x18, %o5 4000002c: 89 38 e0 18 sra %g3, 0x18, %g4 40000034: 9b 3b 60 18 sra %o5, 0x18, %o5 FT=Total size of the function=10 (not inlined) 40000000 <f1>: 40000000: 82 10 20 00 clr %g1 40000004: 84 10 20 00 clr %g2 40000008: 10 80 00 03 b 40000014 <f1+0x14> 4000000c: 86 10 00 08 mov %o0, %g3 40000010: 84 00 a0 01 inc %g2 40000014: 80 a0 80 09 cmp %g2, %o1 40000018: 26 bf ff fe bl,a 40000010 <f1+0x10> 4000001c: 82 00 c0 01 add %g3, %g1, %g1 40000020: 81 c3 e0 08 retl 40000024: 90 02 00 01 add %o0, %g1, %o0 FC=Size of the function's core=10 (when inlined) 4000001c: 82 10 20 00 clr %g1 40000014: 84 10 20 00 clr %g2 40000030: 10 80 00 03 b 4000003c <main+0x3c> 40000038: 84 00 a0 01 inc %g2 4000003c: 80 a0 80 0d cmp %g2, %o5 40000040: 26 bf ff fe bl,a 40000038 <main+0x38> 40000044: 82 01 00 01 add %g4, %g1, %g1 40000048: 87 38 e0 18 sra %g3, 0x18, %g3 4000004c: 82 00 40 03 add %g1, %g3, %g1 40000050: 84 10 20 00 clr %g2 NP=Number of function parameters=2 short f1(char a, short b){ $$$$$$$ (Apparently) Actual Inlining condition $$$$$$$ NC*(CC+NP*CP) + FT>=NC*FC so basically it should do the inline only if the resulting code size is smaller than the code size without inlining. It is true when NC=5 and false when NC=6: CC+NP*CP=2+2*3=8 Test NC=5 5*8+10 >= 5*10 (Condition is true, therefore DO INLINE) Test NC=6 6*8+10 >= 6*10 (Condition is false, therefore DO NOT INLINE) However, it should be false when NC=5 since the code with f1 inlined is a lot bigger than when inline is not done (using -fno-inline). The whole point seems to be the cost of the function parameters CP (eventually the casts done in the caller) which are taken into account in the size estimation only before the inlining and it seems that the compilator believes that after the inline such casts are not going to be done and therefore are not taken into account for the decision. However, after the code compilation, such parameters are included anyway along with the inlined code. $$$$$$$ Correct Inlining condition $$$$$$$ NC*(CC+NP*CP) + FT>NC*(FC+CP) If such condition is applied for the inline decision: CC+NP*CP=8 FC+CP=10+3=13 Test NC=5 5*8+10 >= 5*13 (Condition is false, therefore DO NOT INLINE) Test NC=6 6*8+10 >= 6*13 (Condition is false, therefore DO NOT INLINE) Using the latter condition it is easily found out that inline is not worth in either case (it would only be worth for 1 or 2 calls to f1). Surely the actual method used for taking such decision is much more evolved and complicated that this simple hypothesis, however I strongly believe something similar might be happening in whatever model GCC uses. I appreciate any feedback or suggestions you have about this, maybe I'm doing it all wrong from the begining, but the fact that inline increases the size of the code was weird to me. Thanks in advance, Jorge