On Tue, 2020-02-04 at 10:44 +0100, Henri Cloetens wrote: > Hello Richard, Jeff, > > I checked both. > - The aarch64 backend uses the load double only for stacking operations. > This, I have. This functionality is provided by gcc via the load_multiple construct. > If you define it, gcc will use it for stack and unstack operations. > - The ARM has a peephole2 optimizer. This has the problem that it is run after the > register allocation, and if the register allocation needs to change for the optimization > to be done, the pattern fails. I tried that, I got that working, but ... I dont like it. > - I found another way, which would work in theory: > a. Add it to the "movsi" > 1. Make a "define_expand" of the movsi, which does the following: > a. For the 'normal' case, it calls a define_insn "movsi_internal" > b. It maintains a per-function history of past calls to self. > c. For every call to movsi, it looks in the history if it finds a 'partner' > with which it can create a "load double" > d. If it finds one, it starts going back in the insn-list, and do checking > if the replacement is appropriate. It mainly means no in-between > jumps and labels, no in-between modification of the address register, > not too far back. > e. If the checking is successful, it replaces the previous movsi with the load double. > > For now, I will park this, and do as in aarch64. I might try it later. Trying to do this before register allocation isn't going to work the way you want. But, well, good luck. jeff >