Hello Ian, If the casting is done in the callee function (and not in the caller function), my program is reduced by 5%. I think that all the users compiling with -Os option will be please to implement the casting inside the callee function. What is your feeling? Laurent On 25/10/2010 14:59, Jorge PEREZ wrote: > Hello Laurent & Ian > > I'm actually very interested in the point you're making here. > > I checked the example in the first Laurent's email and I agree with the > fact that for embedded systems, where code size is critical, the cast on > the callers side (rather than the callee's) seems not very helpful. > > The compilation of the test using LLVM (not sure it's a good enough > reference but that's all I got) gives the following results > > ------------------------> C code: > > short somme(char a, short b){ > int i; > for (i=0; i<b; i++){ > a+=a; > } > return a+b; > } > > short somme2(char a, short b){ > int i; > for (i=0; i<b; i++){ > a+=b; > } > return a+2*b; > } > > int main(){ > volatile short b=1; > volatile char a=1, c=1; > b=somme(a,b); > b=somme(c,b); > b=somme(a,c); > b=somme2(a,b); > b=somme2(c,b); > b=somme2(a,c); > b=somme2(a,c*2); > b=somme2(a,c*3); > b=somme2(a,c*4); > return 0; > } > > > ------------------------> GCC 3.4.4 partial dissassembly: > > 40001168 <somme>: > 40001168: 83 2a 60 10 sll %o1, 0x10, %g1 > 4000116c: 83 38 60 10 sra %g1, 0x10, %g1 > 40001170: 80 a0 60 00 cmp %g1, 0 > 40001174: 24 80 00 06 ble,a 4000118c <somme+0x24> > 40001178: 91 2a 20 18 sll %o0, 0x18, %o0 > 4000117c: 82 80 7f ff addcc %g1, -1, %g1 > 40001180: 12 bf ff ff bne 4000117c <somme+0x14> > 40001184: 90 02 00 08 add %o0, %o0, %o0 > 40001188: 91 2a 20 18 sll %o0, 0x18, %o0 > 4000118c: 91 3a 20 18 sra %o0, 0x18, %o0 > 40001190: 90 02 00 09 add %o0, %o1, %o0 > 40001194: 91 2a 20 10 sll %o0, 0x10, %o0 > 40001198: 81 c3 e0 08 retl > 4000119c: 91 3a 20 10 sra %o0, 0x10, %o0 > > 400011e4 <main>: > 400011e4: 9d e3 bf 90 save %sp, -112, %sp > 400011e8: 82 10 20 01 mov 1, %g1 > 400011ec: c2 37 bf f6 sth %g1, [ %fp + -10 ] > 400011f0: c2 2f bf f5 stb %g1, [ %fp + -11 ] > 400011f4: c2 2f bf f4 stb %g1, [ %fp + -12 ] > 400011f8: d0 0f bf f5 ldub [ %fp + -11 ], %o0 > ******************** 1 > 400011fc: d2 17 bf f6 lduh [ %fp + -10 ], %o1 > ******************** 2 > 40001200: 91 2a 20 18 sll %o0, 0x18, %o0 > ******************** 3 > 40001204: 93 2a 60 10 sll %o1, 0x10, %o1 > ******************** 4 > 40001208: 93 3a 60 10 sra %o1, 0x10, %o1 > ******************** 5 > 4000120c: 7f ff ff d7 call 40001168 <somme> > ******************** 6 > 40001210: 91 3a 20 18 sra %o0, 0x18, %o0 > ******************** 7 > 40001214: d0 37 bf f6 sth %o0, [ %fp + -10 ] > ******************** 8 > 40001218: d0 0f bf f4 ldub [ %fp + -12 ], %o0 > 4000121c: d2 17 bf f6 lduh [ %fp + -10 ], %o1 > 40001220: 91 2a 20 18 sll %o0, 0x18, %o0 > 40001224: 93 2a 60 10 sll %o1, 0x10, %o1 > 40001228: 93 3a 60 10 sra %o1, 0x10, %o1 > 4000122c: 7f ff ff cf call 40001168 <somme> > 40001230: 91 3a 20 18 sra %o0, 0x18, %o0 > 40001234: d0 37 bf f6 sth %o0, [ %fp + -10 ] > 40001238: d0 0f bf f5 ldub [ %fp + -11 ], %o0 > 4000123c: d2 0f bf f4 ldub [ %fp + -12 ], %o1 > ... > > ------------------------> LLVM partial dissassembly: > > Disassembly of section .text: > > 00000000 : --> This would be > the "somme" function > 0: 9d e3 bf a0 save %sp, -96, %sp > 4: a0 a6 60 01 subcc %i1, 1, %l0 > 8: 06 80 00 10 bl 48 > c: 01 00 00 00 nop > 10: 10 80 00 02 b 18 > 14: 01 00 00 00 nop > 18: a0 10 00 19 mov %i1, %l0 > 1c: a2 10 00 18 mov %i0, %l1 > 20: a2 0c 60 ff and %l1, 0xff, %l1 > 24: a2 04 40 18 add %l1, %i0, %l1 > 28: a5 2c 60 18 sll %l1, 0x18, %l2 > 2c: b1 3c a0 18 sra %l2, 0x18, %i0 > 30: a0 04 3f ff add %l0, -1, %l0 > 34: a4 a4 20 00 subcc %l0, 0, %l2 > 38: 12 bf ff fa bne 20 > 3c: 01 00 00 00 nop > 40: 10 80 00 02 b 48 > 44: 01 00 00 00 nop > 48: a0 06 00 19 add %i0, %i1, %l0 > 4c: a1 2c 20 10 sll %l0, 0x10, %l0 > 50: b1 3c 20 10 sra %l0, 0x10, %i0 > 54: 81 e8 00 00 restore > 58: 81 c3 e0 08 retl > 5c: 01 00 00 00 nop > > 000000cc : > cc: 9d e3 bf 98 save %sp, -104, %sp > d0: a0 10 20 01 mov 1, %l0 > d4: e0 37 bf fe sth %l0, [ %fp + -2 ] > d8: e0 2f bf fd stb %l0, [ %fp + -3 ] > dc: e0 2f bf fc stb %l0, [ %fp + -4 ] > e0: d0 4f bf fd ldsb [ %fp + -3 ], %o0 ******************** 1 > e4: d2 57 bf fe ldsh [ %fp + -2 ], %o1 ******************** 2 > e8: 40 00 00 00 call e8 ******************** 3 > ec: 01 00 00 00 nop ******************** 4 > f0: d0 37 bf fe sth %o0, [ %fp + -2 ] > f4: d0 4f bf fc ldsb [ %fp + -4 ], %o0 > f8: d2 57 bf fe ldsh [ %fp + -2 ], %o1 > fc: 40 00 00 00 call fc > 100: 01 00 00 00 nop > 104: d0 37 bf fe sth %o0, [ %fp + -2 ] > 108: d0 4f bf fd ldsb [ %fp + -3 ], %o0 > 10c: d2 4f bf fc ldsb [ %fp + -4 ], %o1 > 110: 40 00 00 00 call 110 > 114: 01 00 00 00 nop > 118: d0 37 bf fe sth %o0, [ %fp + -2 ] > ... > > > A couple observations from this: > > - in the case of LLVM only 4 instructions (the NOP is a waste since the > delay slot is not correctly implemented) are required per call (since > the cast is on the callee side). > - in the case of GCC there are 8 instructions required due to the > duplication of the cast in the caller and the callee. > > Based on this, it seems quite interesting to KEEP the cast only in the > CALLEE's side rather than the caller's. Since there are 9 calls in the > main, this requires 9*8=47 instructions with GCC, whereas it only > requires 9*4=36 instructions using LLVM, this is a huge difference when > code size matters. I guess we can assume that in the callee's side the > code size is similar in both cases since a casting is always performed. > > In conclusion, the SPARC code size could be reduced by approx. (4 > instructions) x (number of calls) if the cast is done exclusively on the > CALLEE's side. So, is it really necessary to keep it on the caller's > side or can we try to do it only on the callee's side? > > > PS: since I work with LEON, I permitted myself to put in CC the guys > concerned by this thread http://gcc.gnu.org/ml/gcc/2010-09/msg00014.html > I hope it doesn't bother anyone > > Have a good day, > > > George > > > > Ian Lance Taylor wrote: > >> laurent <laurent.poche@xxxxxxxxx> writes: >> >> >> >>> When a caller function calls a callee function with short or char >>> arguments, the arguments are casted twice: inside the caller function >>> and inside the callee function, see the example. It is a waste of >>> performance in code density and speed! >>> >>> I don't understand why there is a double casting. Is there any >>> optimization I could activate in GCC to remove it? >>> >>> >> It's basically a bug. gcc should only do it on the caller side. Doing >> it on the callee side is a holdover from the good pre-C90 days, when >> code like >> >> int f(i) >> char i; >> { >> ... >> } >> >> had to be treated as equivalent to >> >> int f(int passed_i) >> { >> char i = (char) passed_i; >> ... >> } >> >> These days I think we can just drop the cast on the callee side. As I >> recall that was done for x86 a while back, somebody just needs to do it >> for SPARC. >> >> Please file a bug report according to the instructions at >> http://gcc.gnu.org/bugs/ (unless there is already a bug report for >> this). Thanks. >> >> Ian >> >> >> >