Don't use RDPID in the paranoid entry flow if KVM is enabled as doing so can consume a KVM guest's MSR_TSC_AUX value if an NMI arrives in KVM's run loop. As a performance optimization, KVM loads the guest's TSC_AUX when a CPU first enters its run loop, and on AMD's SVM doesn't restore the host's value until the CPU exits the run loop. VMX is even more aggressive and defers restoring the host's value until the CPU returns to userspace. This optimization obviously relies on the kernel not consuming TSC_AUX, which falls apart if an NMI arrives in the run loop. Removing KVM's optimizaton would be painful, as both SVM and VMX would need to context switch the MSR on every VM-Enter (2x WRMSR + 1x RDMSR), whereas using LSL instead RDPID is a minor blip. Fixes: eaad981291ee3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro") Cc: Dave Hansen <dave.hansen@xxxxxxxxx> Cc: Chang Seok Bae <chang.seok.bae@xxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Sasha Levin <sashal@xxxxxxxxxx> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> Cc: kvm@xxxxxxxxxxxxxxx Reported-by: Tom Lendacky <thomas.lendacky@xxxxxxx> Debugged-by: Tom Lendacky <thomas.lendacky@xxxxxxx> Suggested-by: Andy Lutomirski <luto@xxxxxxxxxx> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> --- Andy, I know you said "unconditionally", but it felt weird adding a comment way down in GET_PERCPU_BASE without plumbing a param in to help provide context. But, paranoid_entry is the only user so adding a param that is unconditional also felt weird. That being said, I definitely don't have a strong opinion one way or the other. arch/x86/entry/calling.h | 10 +++++++--- arch/x86/entry/entry_64.S | 7 ++++++- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 98e4d8886f11c..a925c0cf89c1a 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -342,9 +342,9 @@ For 32-bit we have the following conventions - kernel is built with #endif .endm -.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req +.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req no_rdpid=0 rdgsbase \save_reg - GET_PERCPU_BASE \scratch_reg + GET_PERCPU_BASE \scratch_reg \no_rdpid wrgsbase \scratch_reg .endm @@ -375,11 +375,15 @@ For 32-bit we have the following conventions - kernel is built with * We normally use %gs for accessing per-CPU data, but we are setting up * %gs here and obviously can not use %gs itself to access per-CPU data. */ -.macro GET_PERCPU_BASE reg:req +.macro GET_PERCPU_BASE reg:req no_rdpid=0 + .if \no_rdpid + LOAD_CPU_AND_NODE_SEG_LIMIT \reg + .else ALTERNATIVE \ "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \ "RDPID \reg", \ X86_FEATURE_RDPID + .endif andq $VDSO_CPUNODE_MASK, \reg movq __per_cpu_offset(, \reg, 8), \reg .endm diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 70dea93378162..fd915c46297c5 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -842,8 +842,13 @@ SYM_CODE_START_LOCAL(paranoid_entry) * * The MSR write ensures that no subsequent load is based on a * mispredicted GSBASE. No extra FENCE required. + * + * Disallow RDPID if KVM is enabled as it may consume a guest's TSC_AUX + * if an NMI arrives in KVM's run loop. KVM loads guest's TSC_AUX on + * VM-Enter and may not restore the host's value until the CPU returns + * to userspace, i.e. KVM depends on the kernel not using TSC_AUX. */ - SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx + SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx no_rdpid=IS_ENABLED(CONFIG_KVM) ret .Lparanoid_entry_checkgs: -- 2.28.0