Re: double argument casting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Ian,


If the casting is done in the callee function (and not in the caller
function), my program is reduced by 5%.

I think that all the users compiling with -Os option will be please to
implement the casting inside the callee function.


What is your feeling?


Laurent





On 25/10/2010 14:59, Jorge PEREZ wrote:
> Hello Laurent & Ian
>
> I'm actually very interested in the point you're making here.
>
> I checked the example in the first Laurent's email and I agree with the
> fact that for embedded systems, where code size is critical, the cast on
> the callers side (rather than the callee's) seems not very helpful.
>
> The compilation of the test using LLVM (not sure it's a good enough
> reference but that's all I got) gives the following results
>
> ------------------------> C code:
>
> short somme(char a, short b){
>   int i;
>   for (i=0; i<b; i++){
>     a+=a;
>   }
>   return a+b;
> }
>
> short somme2(char a, short b){
>   int i;
>   for (i=0; i<b; i++){
>     a+=b;
>   }
>   return a+2*b;
> }
>
> int main(){
>   volatile short b=1;
>   volatile char a=1, c=1;
>   b=somme(a,b);
>   b=somme(c,b);
>   b=somme(a,c);
>   b=somme2(a,b);
>   b=somme2(c,b);
>   b=somme2(a,c);
>   b=somme2(a,c*2);
>   b=somme2(a,c*3);
>   b=somme2(a,c*4);      
>   return 0;
> }
>
>
> ------------------------> GCC 3.4.4 partial dissassembly:
>
> 40001168 <somme>:
> 40001168:    83 2a 60 10     sll  %o1, 0x10, %g1
> 4000116c:    83 38 60 10     sra  %g1, 0x10, %g1
> 40001170:    80 a0 60 00     cmp  %g1, 0
> 40001174:    24 80 00 06     ble,a   4000118c <somme+0x24>
> 40001178:    91 2a 20 18     sll  %o0, 0x18, %o0
> 4000117c:    82 80 7f ff     addcc  %g1, -1, %g1
> 40001180:    12 bf ff ff     bne  4000117c <somme+0x14>
> 40001184:    90 02 00 08     add  %o0, %o0, %o0
> 40001188:    91 2a 20 18     sll  %o0, 0x18, %o0
> 4000118c:    91 3a 20 18     sra  %o0, 0x18, %o0
> 40001190:    90 02 00 09     add  %o0, %o1, %o0
> 40001194:    91 2a 20 10     sll  %o0, 0x10, %o0
> 40001198:    81 c3 e0 08     retl
> 4000119c:    91 3a 20 10     sra  %o0, 0x10, %o0
>
> 400011e4 <main>:
> 400011e4:    9d e3 bf 90     save  %sp, -112, %sp
> 400011e8:    82 10 20 01     mov  1, %g1
> 400011ec:    c2 37 bf f6     sth  %g1, [ %fp + -10 ]
> 400011f0:    c2 2f bf f5     stb  %g1, [ %fp + -11 ]
> 400011f4:    c2 2f bf f4     stb  %g1, [ %fp + -12 ]
> 400011f8:    d0 0f bf f5     ldub  [ %fp + -11 ], %o0   
> ******************** 1
> 400011fc:    d2 17 bf f6     lduh  [ %fp + -10 ], %o1   
> ******************** 2
> 40001200:    91 2a 20 18     sll  %o0, 0x18, %o0        
> ******************** 3
> 40001204:    93 2a 60 10     sll  %o1, 0x10, %o1        
> ******************** 4
> 40001208:    93 3a 60 10     sra  %o1, 0x10, %o1        
> ******************** 5
> 4000120c:    7f ff ff d7     call  40001168 <somme>     
> ******************** 6
> 40001210:    91 3a 20 18     sra  %o0, 0x18, %o0        
> ******************** 7
> 40001214:    d0 37 bf f6     sth  %o0, [ %fp + -10 ]    
> ******************** 8
> 40001218:    d0 0f bf f4     ldub  [ %fp + -12 ], %o0
> 4000121c:    d2 17 bf f6     lduh  [ %fp + -10 ], %o1
> 40001220:    91 2a 20 18     sll  %o0, 0x18, %o0
> 40001224:    93 2a 60 10     sll  %o1, 0x10, %o1
> 40001228:    93 3a 60 10     sra  %o1, 0x10, %o1
> 4000122c:    7f ff ff cf     call  40001168 <somme>
> 40001230:    91 3a 20 18     sra  %o0, 0x18, %o0
> 40001234:    d0 37 bf f6     sth  %o0, [ %fp + -10 ]
> 40001238:    d0 0f bf f5     ldub  [ %fp + -11 ], %o0
> 4000123c:    d2 0f bf f4     ldub  [ %fp + -12 ], %o1
> ...
>
> ------------------------> LLVM partial dissassembly:
>
> Disassembly of section .text:
>
> 00000000 :                                             --> This would be
> the "somme" function
>    0: 9d e3 bf a0  save  %sp, -96, %sp
>    4: a0 a6 60 01  subcc  %i1, 1, %l0
>    8: 06 80 00 10  bl  48
>    c: 01 00 00 00  nop
>   10: 10 80 00 02  b  18
>   14: 01 00 00 00  nop
>   18: a0 10 00 19  mov  %i1, %l0
>   1c: a2 10 00 18  mov  %i0, %l1
>   20: a2 0c 60 ff  and  %l1, 0xff, %l1
>   24: a2 04 40 18  add  %l1, %i0, %l1
>   28: a5 2c 60 18  sll  %l1, 0x18, %l2
>   2c: b1 3c a0 18  sra  %l2, 0x18, %i0
>   30: a0 04 3f ff  add  %l0, -1, %l0
>   34: a4 a4 20 00  subcc  %l0, 0, %l2
>   38: 12 bf ff fa  bne  20
>   3c: 01 00 00 00  nop
>   40: 10 80 00 02  b  48
>   44: 01 00 00 00  nop
>   48: a0 06 00 19  add  %i0, %i1, %l0
>   4c: a1 2c 20 10  sll  %l0, 0x10, %l0
>   50: b1 3c 20 10  sra  %l0, 0x10, %i0
>   54: 81 e8 00 00  restore
>   58: 81 c3 e0 08  retl
>   5c: 01 00 00 00  nop
>
> 000000cc :
>   cc: 9d e3 bf 98  save  %sp, -104, %sp
>   d0: a0 10 20 01  mov  1, %l0
>   d4: e0 37 bf fe  sth  %l0, [ %fp + -2 ]
>   d8: e0 2f bf fd  stb  %l0, [ %fp + -3 ]
>   dc: e0 2f bf fc  stb  %l0, [ %fp + -4 ]
>   e0: d0 4f bf fd  ldsb  [ %fp + -3 ], %o0            ******************** 1
>   e4: d2 57 bf fe  ldsh  [ %fp + -2 ], %o1            ******************** 2
>   e8: 40 00 00 00  call  e8                           ******************** 3
>   ec: 01 00 00 00  nop                                ******************** 4
>   f0: d0 37 bf fe  sth  %o0, [ %fp + -2 ]            
>   f4: d0 4f bf fc  ldsb  [ %fp + -4 ], %o0
>   f8: d2 57 bf fe  ldsh  [ %fp + -2 ], %o1
>   fc: 40 00 00 00  call  fc
>  100: 01 00 00 00  nop
>  104: d0 37 bf fe  sth  %o0, [ %fp + -2 ]
>  108: d0 4f bf fd  ldsb  [ %fp + -3 ], %o0
>  10c: d2 4f bf fc  ldsb  [ %fp + -4 ], %o1
>  110: 40 00 00 00  call  110
>  114: 01 00 00 00  nop
>  118: d0 37 bf fe  sth  %o0, [ %fp + -2 ]
>  ...
>
>
> A couple observations from this:
>
> - in the case of LLVM only 4 instructions (the NOP is a waste since the
> delay slot is not correctly implemented) are required per call (since
> the cast is on the callee side).
> - in the case of GCC there are 8 instructions required due to the
> duplication of the cast in the caller and the callee.
>
> Based on this, it seems quite interesting to KEEP the cast only in the
> CALLEE's side rather than the caller's. Since there are 9 calls in the
> main, this requires 9*8=47 instructions with GCC, whereas it only
> requires 9*4=36 instructions using LLVM, this is a huge difference when
> code size matters. I guess we can assume that in the callee's side the
> code size is similar in both cases since a casting is always performed.
>
> In conclusion, the SPARC code size could be reduced by approx. (4
> instructions) x (number of calls) if the cast is done exclusively on the
> CALLEE's side. So, is it really necessary to keep it on the caller's
> side or can we try to do it only on the callee's side?
>
>
> PS: since I work with LEON, I permitted myself to put in CC the guys
> concerned by this thread http://gcc.gnu.org/ml/gcc/2010-09/msg00014.html
> I hope it doesn't bother anyone
>
> Have a good day,
>
>
> George
>
>
>
> Ian Lance Taylor wrote:
>   
>> laurent <laurent.poche@xxxxxxxxx> writes:
>>
>>   
>>     
>>> When a caller function calls a callee function with short or char
>>> arguments, the arguments are casted twice: inside the caller function
>>> and inside the callee function, see the example. It is a waste of
>>> performance in code density and speed!
>>>
>>> I don't understand why there is a double casting.  Is there any
>>> optimization I could activate in GCC to remove it?
>>>     
>>>       
>> It's basically a bug.  gcc should only do it on the caller side.  Doing
>> it on the callee side is a holdover from the good pre-C90 days, when
>> code like
>>
>> int f(i)
>>     char i;
>> {
>>   ...
>> }
>>
>> had to be treated as equivalent to
>>
>> int f(int passed_i)
>> {
>>   char i = (char) passed_i;
>>   ...
>> }
>>
>> These days I think we can just drop the cast on the callee side.  As I
>> recall that was done for x86 a while back, somebody just needs to do it
>> for SPARC.
>>
>> Please file a bug report according to the instructions at
>> http://gcc.gnu.org/bugs/ (unless there is already a bug report for
>> this).  Thanks.
>>
>> Ian
>>
>>   
>>     
>   



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux