Re: Non-optimal code generated for H8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 30/10/2019 01:34, Segher Boessenkool wrote:
And on other targets it does do its job fine, say riscv32, or m68k -O1
(the -O1 to prevent the two stores from being optimised into one).

I haven't managed to find another target where multiplication by 13 is
done with a libcall though.  Maybe I should look harder.

Just to be clear. The call to __mulhi3 by itself is reasonable since the H8 has quite weak shift instructions. And no 32 bit by 32 bit multiplication instruction. It is the repeated re-calculation of the pointer that is the big problem. The code gets both big and slow.

On 10/30/19 10:06 AM, David Brown wrote:
But this also brings up another idea.  Is the OP using "-Os"
optimisation?  My experience (especially with AVR, msp430, and ARM
Cortex-M targets) is that "-Os" optimisation is often quite poor
compared to "-O2".  It can result in very significantly slower code for
a saving of a couple of bytes, and in some cases the code can be
significantly /bigger/ than with -O2.  I don't know whether this is a
backend issue, or a general problem with "-Os", but I no longer use or
recommend "-Os" even for tiny embedded systems.

Yes, I tried all of -O1, -O2 and -Os and probably a few more.

My experience with -Os  on ARM Thumb2 is much more positive than your. "-Os -mslow-flash-data" gives small and quite fast code most of the time.

/Mikael




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux