Re: Non-optimal code generated for H8

Mikael Tillenius <mti-1@xxxxxxxxxxxxx> · Wed, 30 Oct 2019 17:32:49 +0100

On 30/10/2019 01:34, Segher Boessenkool wrote:
And on other targets it does do its job fine, say riscv32, or m68k -O1
(the -O1 to prevent the two stores from being optimised into one).

I haven't managed to find another target where multiplication by 13 is
done with a libcall though.  Maybe I should look harder.

Just to be clear. The call to __mulhi3 by itself is reasonable since the 
H8 has quite weak shift instructions. And no 32 bit by 32 bit 
multiplication instruction. It is the repeated re-calculation of the 
pointer that is the big problem. The code gets both big and slow.

On 10/30/19 10:06 AM, David Brown wrote:
But this also brings up another idea.  Is the OP using "-Os"
optimisation?  My experience (especially with AVR, msp430, and ARM
Cortex-M targets) is that "-Os" optimisation is often quite poor
compared to "-O2".  It can result in very significantly slower code for
a saving of a couple of bytes, and in some cases the code can be
significantly /bigger/ than with -O2.  I don't know whether this is a
backend issue, or a general problem with "-Os", but I no longer use or
recommend "-Os" even for tiny embedded systems.

Yes, I tried all of -O1, -O2 and -Os and probably a few more.

My experience with -Os  on ARM Thumb2 is much more positive than your. 
"-Os -mslow-flash-data" gives small and quite fast code most of the time.

/Mikael