Re: ARM: code size increase starting from gcc 10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/03/2022 09:57, Gabriele Favalessa via Gcc-help wrote:
Hi,

up to gcc 9 this function

#include <stdint.h>
#include <stdbool.h>

bool f() {
     return *(volatile uint32_t*)0x42143fa8 == 0;
}

compiles (arm-none-eabi-gcc -mcpu=cortex-m4 -Os) to:

    0: 4b02       ldr r3, [pc, #8] ; (c <f+0xc>)
    2: 6818       ldr r0, [r3, #0]
    4: fab0 f080 clz r0, r0
    8: 0940       lsrs r0, r0, #5
    a: 4770       bx lr
    c: 42143fa8 .word 0x42143fa8

Starting with gcc 10 it compiles to:

    0: 4b03       ldr r3, [pc, #12] ; (10 <f+0x10>)
    2: f8d3 0fa8 ldr.w r0, [r3, #4008] ; 0xfa8
    6: fab0 f080 clz r0, r0
    a: 0940       lsrs r0, r0, #5
    c: 4770       bx lr
    e: bf00       nop
   10: 42143000 .word 0x42143000

Questions:

1) why newer gcc versions don't generate the smallest possible size in
spite of -Os?

The compiler is trying to identify opportunities to generate even better code for more common cases. For example, if your testcase is changed to:

int f() {
  return (*(volatile unsigned*)0x42143fa8
	  + *(volatile unsigned*)0x42143e00)== 0;
}

Then we see:

        ldr     r3, .L2
        ldr     r2, [r3, #4008]
        ldr     r3, [r3, #3584]
        cmn     r2, r3
        ite     eq
        moveq   r0, #1
        movne   r0, #0
        bx      lr
.L3:
        .align  2
.L2:
        .word   1108619264

being generated which is clearly better than loading two completely different constants from the literal pool to use as bases:

(gcc-9):
        ldr     r3, .L2
        ldr     r2, .L2+4
        ldr     r3, [r3]
        ldr     r2, [r2]
        cmn     r3, r2
        ite     eq
        moveq   r0, #1
        movne   r0, #0
        bx      lr
.L3:
        .align  2
.L2:
        .word   1108623272
        .word   1108622848

Unfortunately, the code that does this has limited visibility of what other operations may be accessing nearby memory, so is not able to work out the optimal situation for every case.

2) is there a way to get the smaller code with newer gcc versions?

Unfortunately, no. At least not at present.

R.


Thanks

Gabriele



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux