On 4/11/19 2:43 PM, William Tambe wrote: > Wouldn't the blockage insn prevent the compiler from re-arranging > other following instructions ? > > In fact, the two emit_insn() need to be seen as one instruction with > the compiler free to re-arrange other instructions around it. > > This is needed to implement "mulsidi3" where the hardware use a second > instruction to return the high-part of a multiplication result, and > that second instruction needs to be issued immediately after the first > instruction which return the low-part of the multiplication. Currently > this is what is used: > > (define_expand "mulsidi3" > [(set (match_operand:DI 0 "register_operand" "=r") > (mult:DI > (sign_extend:DI (match_operand:SI 1 "register_operand" "0")) > (sign_extend:DI (match_operand:SI 2 "register_operand" "r"))))] > "" > { > rtx lo = gen_reg_rtx (SImode); > rtx hi = gen_reg_rtx (SImode); > emit_insn (gen_mulsi3 (lo, operands[1], operands[2])); > emit_insn (gen_mulsi3_highpart (hi, operands[1], operands[2])); > emit_move_insn (gen_lowpart (SImode, operands[0]), lo); > emit_move_insn (gen_highpart (SImode, operands[0]), hi); > DONE; > }) > > What is needed is for the two emit_insn() to be seen as one > instruction with the compiler free to re-arrange other instructions > around it. > > Is there a better way than the blockage insn to achieve the above ? In general if you find yourself needing to force two insns to be consecutive like you're doing, then you're probably doing something wrong. There are exceptions like instruction fusion for scheduling purposes, but that's an optimization, not a correctness issue. What I see above looks like a pretty standard widening multiply. Look at how other ports handle this stuff. If you actually need two machine instructions here, then I'd have a define_insn which emits both in its output template. Just describe it fully in the RTL. While it's *generally* best to have define_insns emit a single instruction, there are exceptions in almost every port. What's definitely not clear to me with your expander above is why you have to create two scratch registers then move those into the final destination. That seems odd. If possible generate your outputs directly into the upper and lower halves of operands[0]. This is pretty easy if you're on a 32bit target since your DImode output will be an aligned hard register pair once register allocation is complete. Jeff