On Mon, 2020-02-03 at 15:50 +0100, Henri Cloetens wrote: > Hello all, > > I have a question on the peephole2 optimizer. > > - My target has a "load double" instruction: > - It does an indexed load of a 64-bit operand to two 32-bit registers. > - The requirement is that the registers are adjacant > (Ri and Ri+1), and that the offset for the second load is 4 byte more > than for the first load. > > - I can not find a way to describe this in gcc. I tried > "load_multiple", and this is OK, but gcc only calls that for stack > pushing. > I tried the vector facility, but this does not work either. > > - I tried to write a peephole2 optimizer, and this works out OK, it > manages to recognize the sequence, ... but the peephole2 optimizer is > run AFTER register allocation, and the optimization needs to be done > BEFORE, as there are constraints on the 2 registers, Ri and Ri+1. > > Any suggestions ?. Is there any way to run peephole2 BEFORE register > allocation ?. I suggest looking at ldp/stp support in the aarch64 backend. jeff