Re: [PATCH] x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()

Nathan Chancellor <nathan@xxxxxxxxxx> · Fri, 11 Aug 2023 10:28:11 -0700

On Fri, Aug 11, 2023 at 08:52:55AM -0700, Sean Christopherson wrote:
> Use 'lea' instead of 'add' when adjusting %rsp in srso_safe_ret() so as to
> avoid clobbering flags.  Drop one of the INT3 instructions to account for
> the LEA consuming one more byte than the ADD.
> 
> KVM's emulator makes indirect calls into a jump table of sorts, where
> the destination of each call is a small blob of code that performs fast
> emulation by executing the target instruction with fixed operands.
> 
> E.g. to emulate ADC, fastop() invokes adcb_al_dl():
> 
>   adcb_al_dl:
>       0xffffffff8105f5f0 <+0>:  adc    %dl,%al
>       0xffffffff8105f5f2 <+2>:  jmp    0xffffffff81a39270 <__x86_return_thunk>
> 
> A major motivation for doing fast emulation is to leverage the CPU to
> handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
> both an input and output to the target of the call.  fastop() collects
> the RFLAGS result by pushing RFLAGS onto the stack and popping them back
> into a variable (held in RDI in this case)
> 
>   asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
> 
>       0xffffffff81062be7 <+71>: mov    0xc0(%r8),%rdx
>       0xffffffff81062bee <+78>: mov    0x100(%r8),%rcx
>       0xffffffff81062bf5 <+85>: push   %rdi
>       0xffffffff81062bf6 <+86>: popf
>       0xffffffff81062bf7 <+87>: call   *%rsi
>       0xffffffff81062bf9 <+89>: nop
>       0xffffffff81062bfa <+90>: nop
>       0xffffffff81062bfb <+91>: nop
>       0xffffffff81062bfc <+92>: pushf
>       0xffffffff81062bfd <+93>: pop    %rdi
> 
> and then propagating the arithmetic flags into the vCPU's emulator state:
> 
>     ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
> 
>       0xffffffff81062be0 <+64>:  and    $0xfffffffffffff72a,%r9
>       0xffffffff81062bfe <+94>:  and    $0x8d5,%edi
>       0xffffffff81062c0d <+109>: or     %rdi,%r9
>       0xffffffff81062c1a <+122>: mov    %r9,0x10(%r8)
> 
> The failures can be most easily reproduced by running the "emulator" test
> in KVM-Unit-Tests.
> 
> If you're feeling a bit of deja vu, see commit b63f20a778c8
> ("x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386").
> 
> Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
> Reported-by: Srikanth Aithal <sraithal@xxxxxxx>
> Closes: https://lore.kernel.org/all/de474347-122d-54cd-eabf-9dcc95ab9eae@xxxxxxx
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: kvm@xxxxxxxxxxxxxxx
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>

This resolves the issue I reported at [1].

Tested-by: Nathan Chancellor <nathan@xxxxxxxxxx>

[1]: https://lore.kernel.org/20230810013334.GA5354@dev-arch.thelio-3990X/

> ---
> 
> Those that fail to learn from history are doomed to repeat it. :-D
> 
>  arch/x86/lib/retpoline.S | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
> index 2cff585f22f2..132cedbf9e57 100644
> --- a/arch/x86/lib/retpoline.S
> +++ b/arch/x86/lib/retpoline.S
> @@ -164,7 +164,7 @@ __EXPORT_THUNK(srso_untrain_ret_alias)
>  /* Needs a definition for the __x86_return_thunk alternative below. */
>  SYM_START(srso_safe_ret_alias, SYM_L_GLOBAL, SYM_A_NONE)
>  #ifdef CONFIG_CPU_SRSO
> -	add $8, %_ASM_SP
> +	lea 8(%_ASM_SP), %_ASM_SP
>  	UNWIND_HINT_FUNC
>  #endif
>  	ANNOTATE_UNRET_SAFE
> @@ -239,7 +239,7 @@ __EXPORT_THUNK(zen_untrain_ret)
>   * SRSO untraining sequence for Zen1/2, similar to zen_untrain_ret()
>   * above. On kernel entry, srso_untrain_ret() is executed which is a
>   *
> - * movabs $0xccccccc308c48348,%rax
> + * movabs $0xccccc30824648d48,%rax
>   *
>   * and when the return thunk executes the inner label srso_safe_ret()
>   * later, it is a stack manipulation and a RET which is mispredicted and
> @@ -252,11 +252,10 @@ SYM_START(srso_untrain_ret, SYM_L_GLOBAL, SYM_A_NONE)
>  	.byte 0x48, 0xb8
>  
>  SYM_INNER_LABEL(srso_safe_ret, SYM_L_GLOBAL)
> -	add $8, %_ASM_SP
> +	lea 8(%_ASM_SP), %_ASM_SP
>  	ret
>  	int3
>  	int3
> -	int3
>  	lfence
>  	call srso_safe_ret
>  	int3
> 
> base-commit: 25aa0bebba72b318e71fe205bfd1236550cc9534
> -- 
> 2.41.0.694.ge786442a9b-goog
>