Re: double argument casting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Laurent & Ian

I'm actually very interested in the point you're making here.

I checked the example in the first Laurent's email and I agree with the
fact that for embedded systems, where code size is critical, the cast on
the callers side (rather than the callee's) seems not very helpful.

The compilation of the test using LLVM (not sure it's a good enough
reference but that's all I got) gives the following results

------------------------> C code:

short somme(char a, short b){
  int i;
  for (i=0; i<b; i++){
    a+=a;
  }
  return a+b;
}

short somme2(char a, short b){
  int i;
  for (i=0; i<b; i++){
    a+=b;
  }
  return a+2*b;
}

int main(){
  volatile short b=1;
  volatile char a=1, c=1;
  b=somme(a,b);
  b=somme(c,b);
  b=somme(a,c);
  b=somme2(a,b);
  b=somme2(c,b);
  b=somme2(a,c);
  b=somme2(a,c*2);
  b=somme2(a,c*3);
  b=somme2(a,c*4);      
  return 0;
}


------------------------> GCC 3.4.4 partial dissassembly:

40001168 <somme>:
40001168:    83 2a 60 10     sll  %o1, 0x10, %g1
4000116c:    83 38 60 10     sra  %g1, 0x10, %g1
40001170:    80 a0 60 00     cmp  %g1, 0
40001174:    24 80 00 06     ble,a   4000118c <somme+0x24>
40001178:    91 2a 20 18     sll  %o0, 0x18, %o0
4000117c:    82 80 7f ff     addcc  %g1, -1, %g1
40001180:    12 bf ff ff     bne  4000117c <somme+0x14>
40001184:    90 02 00 08     add  %o0, %o0, %o0
40001188:    91 2a 20 18     sll  %o0, 0x18, %o0
4000118c:    91 3a 20 18     sra  %o0, 0x18, %o0
40001190:    90 02 00 09     add  %o0, %o1, %o0
40001194:    91 2a 20 10     sll  %o0, 0x10, %o0
40001198:    81 c3 e0 08     retl
4000119c:    91 3a 20 10     sra  %o0, 0x10, %o0

400011e4 <main>:
400011e4:    9d e3 bf 90     save  %sp, -112, %sp
400011e8:    82 10 20 01     mov  1, %g1
400011ec:    c2 37 bf f6     sth  %g1, [ %fp + -10 ]
400011f0:    c2 2f bf f5     stb  %g1, [ %fp + -11 ]
400011f4:    c2 2f bf f4     stb  %g1, [ %fp + -12 ]
400011f8:    d0 0f bf f5     ldub  [ %fp + -11 ], %o0   
******************** 1
400011fc:    d2 17 bf f6     lduh  [ %fp + -10 ], %o1   
******************** 2
40001200:    91 2a 20 18     sll  %o0, 0x18, %o0        
******************** 3
40001204:    93 2a 60 10     sll  %o1, 0x10, %o1        
******************** 4
40001208:    93 3a 60 10     sra  %o1, 0x10, %o1        
******************** 5
4000120c:    7f ff ff d7     call  40001168 <somme>     
******************** 6
40001210:    91 3a 20 18     sra  %o0, 0x18, %o0        
******************** 7
40001214:    d0 37 bf f6     sth  %o0, [ %fp + -10 ]    
******************** 8
40001218:    d0 0f bf f4     ldub  [ %fp + -12 ], %o0
4000121c:    d2 17 bf f6     lduh  [ %fp + -10 ], %o1
40001220:    91 2a 20 18     sll  %o0, 0x18, %o0
40001224:    93 2a 60 10     sll  %o1, 0x10, %o1
40001228:    93 3a 60 10     sra  %o1, 0x10, %o1
4000122c:    7f ff ff cf     call  40001168 <somme>
40001230:    91 3a 20 18     sra  %o0, 0x18, %o0
40001234:    d0 37 bf f6     sth  %o0, [ %fp + -10 ]
40001238:    d0 0f bf f5     ldub  [ %fp + -11 ], %o0
4000123c:    d2 0f bf f4     ldub  [ %fp + -12 ], %o1
...

------------------------> LLVM partial dissassembly:

Disassembly of section .text:

00000000 :                                             --> This would be
the "somme" function
   0: 9d e3 bf a0  save  %sp, -96, %sp
   4: a0 a6 60 01  subcc  %i1, 1, %l0
   8: 06 80 00 10  bl  48
   c: 01 00 00 00  nop
  10: 10 80 00 02  b  18
  14: 01 00 00 00  nop
  18: a0 10 00 19  mov  %i1, %l0
  1c: a2 10 00 18  mov  %i0, %l1
  20: a2 0c 60 ff  and  %l1, 0xff, %l1
  24: a2 04 40 18  add  %l1, %i0, %l1
  28: a5 2c 60 18  sll  %l1, 0x18, %l2
  2c: b1 3c a0 18  sra  %l2, 0x18, %i0
  30: a0 04 3f ff  add  %l0, -1, %l0
  34: a4 a4 20 00  subcc  %l0, 0, %l2
  38: 12 bf ff fa  bne  20
  3c: 01 00 00 00  nop
  40: 10 80 00 02  b  48
  44: 01 00 00 00  nop
  48: a0 06 00 19  add  %i0, %i1, %l0
  4c: a1 2c 20 10  sll  %l0, 0x10, %l0
  50: b1 3c 20 10  sra  %l0, 0x10, %i0
  54: 81 e8 00 00  restore
  58: 81 c3 e0 08  retl
  5c: 01 00 00 00  nop

000000cc :
  cc: 9d e3 bf 98  save  %sp, -104, %sp
  d0: a0 10 20 01  mov  1, %l0
  d4: e0 37 bf fe  sth  %l0, [ %fp + -2 ]
  d8: e0 2f bf fd  stb  %l0, [ %fp + -3 ]
  dc: e0 2f bf fc  stb  %l0, [ %fp + -4 ]
  e0: d0 4f bf fd  ldsb  [ %fp + -3 ], %o0            ******************** 1
  e4: d2 57 bf fe  ldsh  [ %fp + -2 ], %o1            ******************** 2
  e8: 40 00 00 00  call  e8                           ******************** 3
  ec: 01 00 00 00  nop                                ******************** 4
  f0: d0 37 bf fe  sth  %o0, [ %fp + -2 ]            
  f4: d0 4f bf fc  ldsb  [ %fp + -4 ], %o0
  f8: d2 57 bf fe  ldsh  [ %fp + -2 ], %o1
  fc: 40 00 00 00  call  fc
 100: 01 00 00 00  nop
 104: d0 37 bf fe  sth  %o0, [ %fp + -2 ]
 108: d0 4f bf fd  ldsb  [ %fp + -3 ], %o0
 10c: d2 4f bf fc  ldsb  [ %fp + -4 ], %o1
 110: 40 00 00 00  call  110
 114: 01 00 00 00  nop
 118: d0 37 bf fe  sth  %o0, [ %fp + -2 ]
 ...


A couple observations from this:

- in the case of LLVM only 4 instructions (the NOP is a waste since the
delay slot is not correctly implemented) are required per call (since
the cast is on the callee side).
- in the case of GCC there are 8 instructions required due to the
duplication of the cast in the caller and the callee.

Based on this, it seems quite interesting to KEEP the cast only in the
CALLEE's side rather than the caller's. Since there are 9 calls in the
main, this requires 9*8=47 instructions with GCC, whereas it only
requires 9*4=36 instructions using LLVM, this is a huge difference when
code size matters. I guess we can assume that in the callee's side the
code size is similar in both cases since a casting is always performed.

In conclusion, the SPARC code size could be reduced by approx. (4
instructions) x (number of calls) if the cast is done exclusively on the
CALLEE's side. So, is it really necessary to keep it on the caller's
side or can we try to do it only on the callee's side?


PS: since I work with LEON, I permitted myself to put in CC the guys
concerned by this thread http://gcc.gnu.org/ml/gcc/2010-09/msg00014.html
I hope it doesn't bother anyone

Have a good day,


George



Ian Lance Taylor wrote:
> laurent <laurent.poche@xxxxxxxxx> writes:
>
>   
>> When a caller function calls a callee function with short or char
>> arguments, the arguments are casted twice: inside the caller function
>> and inside the callee function, see the example. It is a waste of
>> performance in code density and speed!
>>
>> I don't understand why there is a double casting.  Is there any
>> optimization I could activate in GCC to remove it?
>>     
>
> It's basically a bug.  gcc should only do it on the caller side.  Doing
> it on the callee side is a holdover from the good pre-C90 days, when
> code like
>
> int f(i)
>     char i;
> {
>   ...
> }
>
> had to be treated as equivalent to
>
> int f(int passed_i)
> {
>   char i = (char) passed_i;
>   ...
> }
>
> These days I think we can just drop the cast on the callee side.  As I
> recall that was done for x86 a while back, somebody just needs to do it
> for SPARC.
>
> Please file a bug report according to the instructions at
> http://gcc.gnu.org/bugs/ (unless there is already a bug report for
> this).  Thanks.
>
> Ian
>
>   


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux