> Von: Jeff Law <jeffreyalaw@xxxxxxxxx> > Gesendet: Freitag, 26. März 2021 15:08 > > On 3/26/2021 12:47 AM, Stefan Franke wrote: > >> Von: Gcc-help <gcc-help-bounces@xxxxxxxxxxx> Im Auftrag von Jeff Law > >> via On 3/25/2021 2:16 PM, Stefan Franke wrote: > >>> At least it seems possible to use auto_inc inside an emitted loop, > >>> since that > >> yields a separate bb... > >> > >> I wouldn't rely on that. > >> > > What about using (parallel ) as envelope for the loop body? > > If you mean a parallel where you've got the memory reference as well as an > update of the base register as distinct sets, yes that will work as the RTL > pattern is fully describing the operation. This is documented in the manual: > > An instruction that can be represented with an embedded side effect could > also be represented using @code{parallel} containing an additional > @code{set} to describe how the address register is altered. This is not done > because machines that allow these operations at all typically allow them > wherever a memory address is called for. Describing them as additional > parallel stores would require doubling the number of entries in the machine > description. > > > Or even create a pseudo insn which yield the loop body as asm template? > > I wouldn't recommend this. > Thank you for your insights. Since using a base register with constant offsets seems to yield the best optimizations I added another conversion to auto-inc-dec which converts matching constant offsets into an post inc ladder: from: 9: [r31:SI+0x4]=[r32:SI+0x4] 10: [r31:SI+0x8]=[r32:SI+0x8] 11: [r31:SI+0xc]=[r32:SI+0xc] intermediate: 19: r34:SI=r32:SI+0x4 9: [r31:SI+0x4]=[r34:SI] 10: [r31:SI+0x8]=[r34:SI+0x4] 11: [r31:SI+0xc]=[r34:SI+0x8] REG_DEAD r34:SI converted: 19: r34:SI=r32:SI+0x4 9: [r31:SI+0x4]=[r34:SI] 20: r34:SI=r34:SI+0x4 10: [r31:SI+0x8]=[r34:SI] 21: r34:SI=r34:SI+0x4 11: [r31:SI+0xc]=[r34:SI] REG_DEAD r34:SI The is applied to the destination too => nice post increments. Stefan