On 31/10/2019 08:10, David Brown wrote:
I have one bug report filed for an "-O2 slower than -Os" bug, the other
direction. This was for ARM Cortex-M targets, which I filed at the "GNU
Arm Embedded Toolchain" tracker. They are the main source of gcc
toolchains for small-system ARM embedded development - in my business,
it's important to have specific "toolchain releases" covering the
compiler, library, and other tools. The folks running the project are
quite good at fixing things or punting them upstream to the main gcc
project, but like every other project bugs are reported faster than they
can be handled!
<https://bugs.launchpad.net/gcc-arm-embedded/+bug/1646883>
I don't tend to look at launchpad, so I hadn't seen this. But a simple
fix is quite easy and will now be in gcc-10.
https://gcc.gnu.org/ml/gcc-patches/2019-10/msg02234.html
If you encounter problems with the Arm embedded tools and it can easily
be reproduced on a standard FSF build, then it's a good idea to report
the bug in GCC's own bugzilla, even if you also report it elsewhere to
the vendor of your toolchain.
GCC-10 will produce the following at -Os for your test1 case:
ldr r3, .L2
movs r2, #1
str r2, [r3, #2060]
movs r2, #2
str r2, [r3, #2064]
movs r2, #3
str r2, [r3, #2052]
movs r2, #4
str r2, [r3, #2076]
movs r2, #6
str r2, [r3, #2048]
bx lr
This isn't perfect: we could use a different base and then we'd be able
to make use of the 16-bit variants of the STR instruction. But it's
about balance here - the code that tries to form the base address
doesn't know whether it will be re-used and if so, what other offsets
will be wanted, so we have to guess. The more bits we put in the base
address the lower the probability of finding CSEs (as we get more
bases), but if we put fewer bits in the base address and more in the
offset we'll be more likely to find CSEs, but less likely to be able to
use the smaller instructions.
R.