On 11/03/2022 09:57, Gabriele Favalessa via Gcc-help wrote:
Hi,
up to gcc 9 this function
#include <stdint.h>
#include <stdbool.h>
bool f() {
return *(volatile uint32_t*)0x42143fa8 == 0;
}
compiles (arm-none-eabi-gcc -mcpu=cortex-m4 -Os) to:
0: 4b02 ldr r3, [pc, #8] ; (c <f+0xc>)
2: 6818 ldr r0, [r3, #0]
4: fab0 f080 clz r0, r0
8: 0940 lsrs r0, r0, #5
a: 4770 bx lr
c: 42143fa8 .word 0x42143fa8
Starting with gcc 10 it compiles to:
0: 4b03 ldr r3, [pc, #12] ; (10 <f+0x10>)
2: f8d3 0fa8 ldr.w r0, [r3, #4008] ; 0xfa8
6: fab0 f080 clz r0, r0
a: 0940 lsrs r0, r0, #5
c: 4770 bx lr
e: bf00 nop
10: 42143000 .word 0x42143000
Questions:
1) why newer gcc versions don't generate the smallest possible size in
spite of -Os?
The compiler is trying to identify opportunities to generate even better
code for more common cases. For example, if your testcase is changed to:
int f() {
return (*(volatile unsigned*)0x42143fa8
+ *(volatile unsigned*)0x42143e00)== 0;
}
Then we see:
ldr r3, .L2
ldr r2, [r3, #4008]
ldr r3, [r3, #3584]
cmn r2, r3
ite eq
moveq r0, #1
movne r0, #0
bx lr
.L3:
.align 2
.L2:
.word 1108619264
being generated which is clearly better than loading two completely
different constants from the literal pool to use as bases:
(gcc-9):
ldr r3, .L2
ldr r2, .L2+4
ldr r3, [r3]
ldr r2, [r2]
cmn r3, r2
ite eq
moveq r0, #1
movne r0, #0
bx lr
.L3:
.align 2
.L2:
.word 1108623272
.word 1108622848
Unfortunately, the code that does this has limited visibility of what
other operations may be accessing nearby memory, so is not able to work
out the optimal situation for every case.
2) is there a way to get the smaller code with newer gcc versions?
Unfortunately, no. At least not at present.
R.
Thanks
Gabriele