Hi, We were using GCC 3.4.0 to generate Thumb code for ARM processor, switching to GCC 4.1.1 has improved our code size (we always use -Os switch), but has severely altered the execution speed. After further investigation, we isolate one the problem in the following example: Source code: void foo(int *a) { int i; for (i = 0; i < 1000000; i++) a[0] += a[1]; } The result with GCC 3.4.0 with -mthumb -Os was: 00000000 <foo>: 0: b500 push {lr} 2: 6803 ldr r3, [r0, #0] 4: 4a03 ldr r2, [pc, #12] (14 <.text+0x14>) 6: 6841 ldr r1, [r0, #4] 8: 3a01 sub r2, #1 a: 185b add r3, r3, r1 c: 2a00 cmp r2, #0 e: d1fb bne 8 <foo+0x8> 10: 6003 str r3, [r0, #0] 12: bd00 pop {pc} 14: 4240 neg r0, r0 16: 000f lsl r7, r1, #0 when compiled for ARM with GCC 4.1.1 (and mainline too) with -mthumb -O1, we get: 00000000 <foo>: 0: b510 push {r4, lr} 2: 1c04 adds r4, r0, #0 4: 2200 movs r2, #0 6: 6841 ldr r1, [r0, #4] 8: 4803 ldr r0, [pc, #12] (18 <.text+0x18>) a: 6823 ldr r3, [r4, #0] c: 185b adds r3, r3, r1 e: 3201 adds r2, #1 10: 4282 cmp r2, r0 12: d1fb bne.n c <foo+0xc> 14: 6023 str r3, [r4, #0] 16: bd10 pop {r4, pc} 18: 4240 negs r0, r0 1a: 000f lsls r7, r1, #0 -> No so bad but slower than 3.4.0 when compiled with -mthumb -Os, we get: 00000000 <foo>: 0: b510 push {r4, lr} 2: 6802 ldr r2, [r0, #0] 4: 6844 ldr r4, [r0, #4] 6: 2100 movs r1, #0 8: 4b03 ldr r3, [pc, #12] (18 <.text+0x18>) a: 3101 adds r1, #1 c: 1912 adds r2, r2, r4 e: 4299 cmp r1, r3 10: d1fa bne.n 8 <foo+0x8> 12: 6002 str r2, [r0, #0] 14: bd10 pop {r4, pc} 16: 0000 lsls r0, r0, #0 18: 4240 negs r0, r0 1a: 000f lsls r7, r1, #0 -> The Load of the loop end value is performed within the loop ! when compiled with -mthumb -O3, we get: 00000000 <foo>: 0: b530 push {r4, r5, lr} 2: 6802 ldr r2, [r0, #0] 4: 4d05 ldr r5, [pc, #20] (1c <.text+0x1c>) 6: 1d04 adds r4, r0, #4 8: 2100 movs r1, #0 a: 6823 ldr r3, [r4, #0] c: 3101 adds r1, #1 e: 18d3 adds r3, r2, r3 10: 1c1a adds r2, r3, #0 12: 6003 str r3, [r0, #0] 14: 42a9 cmp r1, r5 16: d1f8 bne.n a <foo+0xa> 18: bd30 pop {r4, r5, pc} 1a: 0000 lsls r0, r0, #0 1c: 4240 negs r0, r0 1e: 000f lsls r7, r1, #0 -> Amazingly slow ! Does anybody has a magic set of options to generate an efficient and small code as 3.4.0 did. Thanks in advance for any hints on this problem. David