Re: ARM: code size increase starting from gcc 10

Richard Earnshaw via Gcc-help <gcc-help@xxxxxxxxxxx> · Fri, 11 Mar 2022 15:20:25 +0000

On 11/03/2022 09:57, Gabriele Favalessa via Gcc-help wrote:
Hi,

up to gcc 9 this function

#include <stdint.h>
#include <stdbool.h>

bool f() {
     return *(volatile uint32_t*)0x42143fa8 == 0;
}

compiles (arm-none-eabi-gcc -mcpu=cortex-m4 -Os) to:

    0: 4b02       ldr r3, [pc, #8] ; (c <f+0xc>)
    2: 6818       ldr r0, [r3, #0]
    4: fab0 f080 clz r0, r0
    8: 0940       lsrs r0, r0, #5
    a: 4770       bx lr
    c: 42143fa8 .word 0x42143fa8

Starting with gcc 10 it compiles to:

    0: 4b03       ldr r3, [pc, #12] ; (10 <f+0x10>)
    2: f8d3 0fa8 ldr.w r0, [r3, #4008] ; 0xfa8
    6: fab0 f080 clz r0, r0
    a: 0940       lsrs r0, r0, #5
    c: 4770       bx lr
    e: bf00       nop
   10: 42143000 .word 0x42143000

Questions:

1) why newer gcc versions don't generate the smallest possible size in
spite of -Os?

The compiler is trying to identify opportunities to generate even better 
code for more common cases.  For example, if your testcase is changed to:

int f() {
  return (*(volatile unsigned*)0x42143fa8
	  + *(volatile unsigned*)0x42143e00)== 0;
}

Then we see:

        ldr     r3, .L2
        ldr     r2, [r3, #4008]
        ldr     r3, [r3, #3584]
        cmn     r2, r3
        ite     eq
        moveq   r0, #1
        movne   r0, #0
        bx      lr
.L3:
        .align  2
.L2:
        .word   1108619264

being generated which is clearly better than loading two completely 
different constants from the literal pool to use as bases:

(gcc-9):
        ldr     r3, .L2
        ldr     r2, .L2+4
        ldr     r3, [r3]
        ldr     r2, [r2]
        cmn     r3, r2
        ite     eq
        moveq   r0, #1
        movne   r0, #0
        bx      lr
.L3:
        .align  2
.L2:
        .word   1108623272
        .word   1108622848

Unfortunately, the code that does this has limited visibility of what 
other operations may be accessing nearby memory, so is not able to work 
out the optimal situation for every case.

2) is there a way to get the smaller code with newer gcc versions?

Unfortunately, no. At least not at present.

R.

Thanks

Gabriele