Hi David, I've noticed the same problem with the GCC 4.1.3 (in thumb mode for ARM). When a simple test file is compiled with -s to only get the "pseudo" assembly form, quality of the generated code is quite poor. I've seen an equivalent inquiry to codesourcery mailing list, quoting that even if gcc 4.x series perform good optimization, simple cases of loop are sadly compiled. But I'm quite surprised by the miss of comments to this quote... Ps: Happy new year everybody! ______________________________ Hi, We were using GCC 3.4.0 to generate Thumb code for ARM processor, switching to GCC 4.1.1 has improved our code size (we always use -Os switch), but has severely altered the execution speed. After further investigation, we isolate one the problem in the following example: Source code: void foo(int *a) { int i; for (i = 0; i < 1000000; i++) a[0] += a[1]; } The result with GCC 3.4.0 with -mthumb -Os was: 00000000 <foo>: 0: b500 push {lr} 2: 6803 ldr r3, [r0, #0] 4: 4a03 ldr r2, [pc, #12] (14 <.text+0x14>) 6: 6841 ldr r1, [r0, #4] 8: 3a01 sub r2, #1 a: 185b add r3, r3, r1 c: 2a00 cmp r2, #0 e: d1fb bne 8 <foo+0x8> 10: 6003 str r3, [r0, #0] 12: bd00 pop {pc} 14: 4240 neg r0, r0 16: 000f lsl r7, r1, #0 when compiled for ARM with GCC 4.1.1 (and mainline too) with -mthumb -O1, we get: 00000000 <foo>: 0: b510 push {r4, lr} 2: 1c04 adds r4, r0, #0 4: 2200 movs r2, #0 6: 6841 ldr r1, [r0, #4] 8: 4803 ldr r0, [pc, #12] (18 <.text+0x18>) a: 6823 ldr r3, [r4, #0] c: 185b adds r3, r3, r1 e: 3201 adds r2, #1 10: 4282 cmp r2, r0 12: d1fb bne.n c <foo+0xc> 14: 6023 str r3, [r4, #0] 16: bd10 pop {r4, pc} 18: 4240 negs r0, r0 1a: 000f lsls r7, r1, #0 -> No so bad but slower than 3.4.0 when compiled with -mthumb -Os, we get: 00000000 <foo>: 0: b510 push {r4, lr} 2: 6802 ldr r2, [r0, #0] 4: 6844 ldr r4, [r0, #4] 6: 2100 movs r1, #0 8: 4b03 ldr r3, [pc, #12] (18 <.text+0x18>) a: 3101 adds r1, #1 c: 1912 adds r2, r2, r4 e: 4299 cmp r1, r3 10: d1fa bne.n 8 <foo+0x8> 12: 6002 str r2, [r0, #0] 14: bd10 pop {r4, pc} 16: 0000 lsls r0, r0, #0 18: 4240 negs r0, r0 1a: 000f lsls r7, r1, #0 -> The Load of the loop end value is performed within the loop ! when compiled with -mthumb -O3, we get: 00000000 <foo>: 0: b530 push {r4, r5, lr} 2: 6802 ldr r2, [r0, #0] 4: 4d05 ldr r5, [pc, #20] (1c <.text+0x1c>) 6: 1d04 adds r4, r0, #4 8: 2100 movs r1, #0 a: 6823 ldr r3, [r4, #0] c: 3101 adds r1, #1 e: 18d3 adds r3, r2, r3 10: 1c1a adds r2, r3, #0 12: 6003 str r3, [r0, #0] 14: 42a9 cmp r1, r5 16: d1f8 bne.n a <foo+0xa> 18: bd30 pop {r4, r5, pc} 1a: 0000 lsls r0, r0, #0 1c: 4240 negs r0, r0 1e: 000f lsls r7, r1, #0 -> Amazingly slow ! Does anybody has a magic set of options to generate an efficient and small code as 3.4.0 did. Thanks in advance for any hints on this problem. David __________________________________________________ Do You Yahoo!? En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités http://mail.yahoo.fr Yahoo! Mail