From: tip-bot2@xxxxxxxxxxxxx > Sent: 03 April 2021 12:11 ... > Notice that since the longest alternative sequence is now: > > 0: e8 07 00 00 00 callq c <.altinstr_replacement+0xc> > 5: f3 90 pause > 7: 0f ae e8 lfence > a: eb f9 jmp 5 <.altinstr_replacement+0x5> > c: 48 89 04 24 mov %rax,(%rsp) > 10: c3 retq > > 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW, if > we can shrink the retpoline by 1 byte we can pack it more densely). Every time I see this I can't help feeling that doing something (aka anything) to get the 'mov' and 'retq' into the same 16 byte code fetch/decode block but be advantageous. Even something like: call 1f pause jmp 2f 1: mov %rax,(%rsp) retq 2: pause lfence jmp 2b Might meet all the requirements for the retpoline while allowing the 'mov' and 'retq' be decoded in the same clock. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)