RE: [tip: x86/core] x86/retpoline: Simplify retpolines

David Laight <David.Laight@xxxxxxxxxx> · Tue, 6 Apr 2021 08:56:50 +0000

From: tip-bot2@xxxxxxxxxxxxx
> Sent: 03 April 2021 12:11
...
> Notice that since the longest alternative sequence is now:
> 
>    0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
>    5:   f3 90                   pause
>    7:   0f ae e8                lfence
>    a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
>    c:   48 89 04 24             mov    %rax,(%rsp)
>   10:   c3                      retq
> 
> 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW, if
> we can shrink the retpoline by 1 byte we can pack it more densely).

Every time I see this I can't help feeling that doing something
(aka anything) to get the 'mov' and 'retq' into the same 16 byte
code fetch/decode block but be advantageous.

Even something like:
	call	1f
	pause
	jmp 	2f
1:	mov	%rax,(%rsp)
	retq
2:	pause
	lfence
	jmp	2b
Might meet all the requirements for the retpoline while
allowing the 'mov' and 'retq' be decoded in the same clock.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)