Hello Laurent & Ian I'm actually very interested in the point you're making here. I checked the example in the first Laurent's email and I agree with the fact that for embedded systems, where code size is critical, the cast on the callers side (rather than the callee's) seems not very helpful. The compilation of the test using LLVM (not sure it's a good enough reference but that's all I got) gives the following results ------------------------> C code: short somme(char a, short b){ int i; for (i=0; i<b; i++){ a+=a; } return a+b; } short somme2(char a, short b){ int i; for (i=0; i<b; i++){ a+=b; } return a+2*b; } int main(){ volatile short b=1; volatile char a=1, c=1; b=somme(a,b); b=somme(c,b); b=somme(a,c); b=somme2(a,b); b=somme2(c,b); b=somme2(a,c); b=somme2(a,c*2); b=somme2(a,c*3); b=somme2(a,c*4); return 0; } ------------------------> GCC 3.4.4 partial dissassembly: 40001168 <somme>: 40001168: 83 2a 60 10 sll %o1, 0x10, %g1 4000116c: 83 38 60 10 sra %g1, 0x10, %g1 40001170: 80 a0 60 00 cmp %g1, 0 40001174: 24 80 00 06 ble,a 4000118c <somme+0x24> 40001178: 91 2a 20 18 sll %o0, 0x18, %o0 4000117c: 82 80 7f ff addcc %g1, -1, %g1 40001180: 12 bf ff ff bne 4000117c <somme+0x14> 40001184: 90 02 00 08 add %o0, %o0, %o0 40001188: 91 2a 20 18 sll %o0, 0x18, %o0 4000118c: 91 3a 20 18 sra %o0, 0x18, %o0 40001190: 90 02 00 09 add %o0, %o1, %o0 40001194: 91 2a 20 10 sll %o0, 0x10, %o0 40001198: 81 c3 e0 08 retl 4000119c: 91 3a 20 10 sra %o0, 0x10, %o0 400011e4 <main>: 400011e4: 9d e3 bf 90 save %sp, -112, %sp 400011e8: 82 10 20 01 mov 1, %g1 400011ec: c2 37 bf f6 sth %g1, [ %fp + -10 ] 400011f0: c2 2f bf f5 stb %g1, [ %fp + -11 ] 400011f4: c2 2f bf f4 stb %g1, [ %fp + -12 ] 400011f8: d0 0f bf f5 ldub [ %fp + -11 ], %o0 ******************** 1 400011fc: d2 17 bf f6 lduh [ %fp + -10 ], %o1 ******************** 2 40001200: 91 2a 20 18 sll %o0, 0x18, %o0 ******************** 3 40001204: 93 2a 60 10 sll %o1, 0x10, %o1 ******************** 4 40001208: 93 3a 60 10 sra %o1, 0x10, %o1 ******************** 5 4000120c: 7f ff ff d7 call 40001168 <somme> ******************** 6 40001210: 91 3a 20 18 sra %o0, 0x18, %o0 ******************** 7 40001214: d0 37 bf f6 sth %o0, [ %fp + -10 ] ******************** 8 40001218: d0 0f bf f4 ldub [ %fp + -12 ], %o0 4000121c: d2 17 bf f6 lduh [ %fp + -10 ], %o1 40001220: 91 2a 20 18 sll %o0, 0x18, %o0 40001224: 93 2a 60 10 sll %o1, 0x10, %o1 40001228: 93 3a 60 10 sra %o1, 0x10, %o1 4000122c: 7f ff ff cf call 40001168 <somme> 40001230: 91 3a 20 18 sra %o0, 0x18, %o0 40001234: d0 37 bf f6 sth %o0, [ %fp + -10 ] 40001238: d0 0f bf f5 ldub [ %fp + -11 ], %o0 4000123c: d2 0f bf f4 ldub [ %fp + -12 ], %o1 ... ------------------------> LLVM partial dissassembly: Disassembly of section .text: 00000000 : --> This would be the "somme" function 0: 9d e3 bf a0 save %sp, -96, %sp 4: a0 a6 60 01 subcc %i1, 1, %l0 8: 06 80 00 10 bl 48 c: 01 00 00 00 nop 10: 10 80 00 02 b 18 14: 01 00 00 00 nop 18: a0 10 00 19 mov %i1, %l0 1c: a2 10 00 18 mov %i0, %l1 20: a2 0c 60 ff and %l1, 0xff, %l1 24: a2 04 40 18 add %l1, %i0, %l1 28: a5 2c 60 18 sll %l1, 0x18, %l2 2c: b1 3c a0 18 sra %l2, 0x18, %i0 30: a0 04 3f ff add %l0, -1, %l0 34: a4 a4 20 00 subcc %l0, 0, %l2 38: 12 bf ff fa bne 20 3c: 01 00 00 00 nop 40: 10 80 00 02 b 48 44: 01 00 00 00 nop 48: a0 06 00 19 add %i0, %i1, %l0 4c: a1 2c 20 10 sll %l0, 0x10, %l0 50: b1 3c 20 10 sra %l0, 0x10, %i0 54: 81 e8 00 00 restore 58: 81 c3 e0 08 retl 5c: 01 00 00 00 nop 000000cc : cc: 9d e3 bf 98 save %sp, -104, %sp d0: a0 10 20 01 mov 1, %l0 d4: e0 37 bf fe sth %l0, [ %fp + -2 ] d8: e0 2f bf fd stb %l0, [ %fp + -3 ] dc: e0 2f bf fc stb %l0, [ %fp + -4 ] e0: d0 4f bf fd ldsb [ %fp + -3 ], %o0 ******************** 1 e4: d2 57 bf fe ldsh [ %fp + -2 ], %o1 ******************** 2 e8: 40 00 00 00 call e8 ******************** 3 ec: 01 00 00 00 nop ******************** 4 f0: d0 37 bf fe sth %o0, [ %fp + -2 ] f4: d0 4f bf fc ldsb [ %fp + -4 ], %o0 f8: d2 57 bf fe ldsh [ %fp + -2 ], %o1 fc: 40 00 00 00 call fc 100: 01 00 00 00 nop 104: d0 37 bf fe sth %o0, [ %fp + -2 ] 108: d0 4f bf fd ldsb [ %fp + -3 ], %o0 10c: d2 4f bf fc ldsb [ %fp + -4 ], %o1 110: 40 00 00 00 call 110 114: 01 00 00 00 nop 118: d0 37 bf fe sth %o0, [ %fp + -2 ] ... A couple observations from this: - in the case of LLVM only 4 instructions (the NOP is a waste since the delay slot is not correctly implemented) are required per call (since the cast is on the callee side). - in the case of GCC there are 8 instructions required due to the duplication of the cast in the caller and the callee. Based on this, it seems quite interesting to KEEP the cast only in the CALLEE's side rather than the caller's. Since there are 9 calls in the main, this requires 9*8=47 instructions with GCC, whereas it only requires 9*4=36 instructions using LLVM, this is a huge difference when code size matters. I guess we can assume that in the callee's side the code size is similar in both cases since a casting is always performed. In conclusion, the SPARC code size could be reduced by approx. (4 instructions) x (number of calls) if the cast is done exclusively on the CALLEE's side. So, is it really necessary to keep it on the caller's side or can we try to do it only on the callee's side? PS: since I work with LEON, I permitted myself to put in CC the guys concerned by this thread http://gcc.gnu.org/ml/gcc/2010-09/msg00014.html I hope it doesn't bother anyone Have a good day, George Ian Lance Taylor wrote: > laurent <laurent.poche@xxxxxxxxx> writes: > > >> When a caller function calls a callee function with short or char >> arguments, the arguments are casted twice: inside the caller function >> and inside the callee function, see the example. It is a waste of >> performance in code density and speed! >> >> I don't understand why there is a double casting. Is there any >> optimization I could activate in GCC to remove it? >> > > It's basically a bug. gcc should only do it on the caller side. Doing > it on the callee side is a holdover from the good pre-C90 days, when > code like > > int f(i) > char i; > { > ... > } > > had to be treated as equivalent to > > int f(int passed_i) > { > char i = (char) passed_i; > ... > } > > These days I think we can just drop the cast on the callee side. As I > recall that was done for x86 a while back, somebody just needs to do it > for SPARC. > > Please file a bug report according to the instructions at > http://gcc.gnu.org/bugs/ (unless there is already a bug report for > this). Thanks. > > Ian > >