On 30/10/2019 01:34, Segher Boessenkool wrote:
And on other targets it does do its job fine, say riscv32, or m68k -O1
(the -O1 to prevent the two stores from being optimised into one).
I haven't managed to find another target where multiplication by 13 is
done with a libcall though. Maybe I should look harder.
Just to be clear. The call to __mulhi3 by itself is reasonable since the
H8 has quite weak shift instructions. And no 32 bit by 32 bit
multiplication instruction. It is the repeated re-calculation of the
pointer that is the big problem. The code gets both big and slow.
On 10/30/19 10:06 AM, David Brown wrote:
But this also brings up another idea. Is the OP using "-Os"
optimisation? My experience (especially with AVR, msp430, and ARM
Cortex-M targets) is that "-Os" optimisation is often quite poor
compared to "-O2". It can result in very significantly slower code for
a saving of a couple of bytes, and in some cases the code can be
significantly /bigger/ than with -O2. I don't know whether this is a
backend issue, or a general problem with "-Os", but I no longer use or
recommend "-Os" even for tiny embedded systems.
Yes, I tried all of -O1, -O2 and -Os and probably a few more.
My experience with -Os on ARM Thumb2 is much more positive than your.
"-Os -mslow-flash-data" gives small and quite fast code most of the time.
/Mikael