AW: AW: AW: AW: setmemsi, movmemsi and post_inc

"Stefan Franke" <stefan@xxxxxxxxx> · Fri, 26 Mar 2021 15:33:32 +0100

> Von: Jeff Law <jeffreyalaw@xxxxxxxxx>
> Gesendet: Freitag, 26. März 2021 15:08
>
> On 3/26/2021 12:47 AM, Stefan Franke wrote:
> >> Von: Gcc-help <gcc-help-bounces@xxxxxxxxxxx> Im Auftrag von Jeff Law
> >> via On 3/25/2021 2:16 PM, Stefan Franke wrote:
> >>> At least it seems possible to use auto_inc inside an emitted loop,
> >>> since that
> >> yields a separate bb...
> >>
> >> I wouldn't rely on that.
> >>
> > What about using (parallel ) as envelope for the loop body?
> 
> If you mean a parallel where you've got the memory reference as well as an
> update of the base register as distinct sets, yes that will work as the RTL
> pattern is fully describing the operation. This is documented in the manual:
> 
> An instruction that can be represented with an embedded side effect could
> also be represented using @code{parallel} containing an additional
> @code{set} to describe how the address register is altered.  This is not done
> because machines that allow these operations at all typically allow them
> wherever a memory address is called for.  Describing them as additional
> parallel stores would require doubling the number of entries in the machine
> description.
> 
> > Or even create a pseudo insn which yield the loop body as asm template?
> 
> I wouldn't recommend this.
> 

Thank you for your insights.

Since using a base register with constant offsets seems to yield the best optimizations I added another conversion to auto-inc-dec which converts matching constant offsets into an post inc ladder:
from: 
    9: [r31:SI+0x4]=[r32:SI+0x4]
   10: [r31:SI+0x8]=[r32:SI+0x8]
   11: [r31:SI+0xc]=[r32:SI+0xc]
intermediate:
   19: r34:SI=r32:SI+0x4
    9: [r31:SI+0x4]=[r34:SI]
   10: [r31:SI+0x8]=[r34:SI+0x4]
   11: [r31:SI+0xc]=[r34:SI+0x8]
      REG_DEAD r34:SI
converted:
   19: r34:SI=r32:SI+0x4
    9: [r31:SI+0x4]=[r34:SI]
   20: r34:SI=r34:SI+0x4
   10: [r31:SI+0x8]=[r34:SI]
   21: r34:SI=r34:SI+0x4
   11: [r31:SI+0xc]=[r34:SI]
      REG_DEAD r34:SI

The is applied to the destination too => nice post increments.

Stefan