On 30/10/2019 01:34, Segher Boessenkool wrote: > On Tue, Oct 29, 2019 at 02:19:25PM -0600, Jeff Law wrote: >> On 10/29/19 2:03 PM, Mikael Tillenius wrote: >>> I am using a cross compiler for Renesas H8S. In a few places it >>> generates really bad code. Given the following program: >>> >>> struct s { >>> char a, b; >>> char c[11]; >>> } x[2]; >>> >>> void test(int n) >>> { >>> struct s *sp = &x[n]; >>> >>> sp->a = 1; >>> sp->b = 1; >>> } > >> As we leave gimple the code looks like: >> >> MEM <struct s[2]> [(struct s *)&x][n_1(D)].a = 1; >> MEM <struct s[2]> [(struct s *)&x][n_1(D)].b = 1; >> >> One might argue that DOM or FRE should have created a common >> subexpression for the address arithmetic here. Even so it's not bad. >> >> CSE doesn't do its job though. THere's clearly a REG_EQUAL note which >> should have allowed it to at least cleanup the redundant multiplication >> for the address calculation. > > And on other targets it does do its job fine, say riscv32, or m68k -O1 > (the -O1 to prevent the two stores from being optimised into one). > > I haven't managed to find another target where multiplication by 13 is > done with a libcall though. Maybe I should look harder. > I checked on the 8-bit AVR, which is (I think) the smallest and simplest device targeted by gcc, and where only some devices have multiplication instructions. If you optimise for size (-Os), it uses a library call __mulhi3 for the multiplication: test: ldi r22,lo8(13) ldi r23,0 rcall __mulhi3 subi r24,lo8(-(x)) sbci r25,hi8(-(x)) ldi r18,lo8(1) mov r26,r24 mov r27,r25 st X,r18 mov r30,r26 mov r31,r27 std Z+1,r18 ret With -O1 or -O2, it uses shifts and adds for the multiply: test: mov r30,r24 mov r31,r25 lsl r30 rol r31 add r30,r24 adc r31,r25 lsl r30 rol r31 lsl r30 rol r31 add r24,r30 adc r25,r31 mov r30,r24 mov r31,r25 subi r30,lo8(-(x)) sbci r31,hi8(-(x)) ldi r24,lo8(1) st Z,r24 std Z+1,r24 ret (I used an old version of avr-gcc here, version 5.4.0, just because it is on the extremely useful <https://godbolt.org> online compiler site. Things may have changed for later versions, but usually the AVR port is fairly stable.) One thing to note here is that with -O1, the compiler calculates "sp" in the "Z" register, then stores 1 into [sp] and [sp+1]. With the multiplication, sp is calculated in a non-pointer register pair and the compiler generates sub-optimal code for storing in [sp] and [sp+1]. If that is still the case for modern gcc, it could be filed as a missed optimisation bug for the AVR backend. But this also brings up another idea. Is the OP using "-Os" optimisation? My experience (especially with AVR, msp430, and ARM Cortex-M targets) is that "-Os" optimisation is often quite poor compared to "-O2". It can result in very significantly slower code for a saving of a couple of bytes, and in some cases the code can be significantly /bigger/ than with -O2. I don't know whether this is a backend issue, or a general problem with "-Os", but I no longer use or recommend "-Os" even for tiny embedded systems. So perhaps simply changing from "-Os" to "-O2" will fix the OP's problems.