Sean Christopherson <seanjc@xxxxxxxxxx> writes: > Zero vmcs.HOST_IA32_SYSENTER_ESP when initializing *constant* host state > if and only if SYSENTER cannot be used, i.e. the kernel is a 64-bit > kernel and is not emulating 32-bit syscalls. As the name suggests, > vmx_set_constant_host_state() is intended for state that is *constant*. > When SYSENTER is used, SYSENTER_ESP isn't constant because stacks are > per-CPU, and the VMCS must be updated whenever the vCPU is migrated to a > new CPU. The logic in vmx_vcpu_load_vmcs() doesn't differentiate between > "never loaded" and "loaded on a different CPU", i.e. setting SYSENTER_ESP > on VMCS load also handles setting correct host state when the VMCS is > first loaded. > > Because a VMCS must be loaded before it is initialized during vCPU RESET, > zeroing the field in vmx_set_constant_host_state() obliterates the value > that was written when the VMCS was loaded. If the vCPU is run before it > is migrated, the subsequent VM-Exit will zero out MSR_IA32_SYSENTER_ESP, > leading to a #DF on the next 32-bit syscall. > > double fault: 0000 [#1] SMP > CPU: 0 PID: 990 Comm: stable Not tainted 5.16.0+ #97 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > EIP: entry_SYSENTER_32+0x0/0xe7 > Code: <9c> 50 eb 17 0f 20 d8 a9 00 10 00 00 74 0d 25 ff ef ff ff 0f 22 d8 > EAX: 000000a2 EBX: a8d1300c ECX: a8d13014 EDX: 00000000 > ESI: a8f87000 EDI: a8d13014 EBP: a8d12fc0 ESP: 00000000 > DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210093 > CR0: 80050033 CR2: fffffffc CR3: 02c3b000 CR4: 00152e90 > > Fixes: 6ab8a4053f71 ("KVM: VMX: Avoid to rdmsrl(MSR_IA32_SYSENTER_ESP)") > Cc: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/vmx/vmx.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index a02a28ce7cc3..ce2aae12fcc5 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -4094,10 +4094,13 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx) > vmcs_write32(HOST_IA32_SYSENTER_CS, low32); > > /* > - * If 32-bit syscall is enabled, vmx_vcpu_load_vcms rewrites > - * HOST_IA32_SYSENTER_ESP. > + * SYSENTER is used only for (emulating) 32-bit kernels, zero out > + * SYSENTER.ESP if it is NOT used. When SYSENTER is used, the per-CPU > + * stack is set when the VMCS is loaded (and may already be set!). For an unprepared reader, I'd suggest adding something like "This pairs with how HOST_IA32_SYSENTER_ESP is written in vmx_vcpu_load_vmcs()". > */ > - vmcs_writel(HOST_IA32_SYSENTER_ESP, 0); > + if (!IS_ENABLED(CONFIG_IA32_EMULATION) && !IS_ENABLED(CONFIG_X86_32)) Isn't it the same as "!IS_ENABLED(CONFIG_COMPAT_32)"? (same goes to the check in vmx_vcpu_load_vmcs()) > + vmcs_writel(HOST_IA32_SYSENTER_ESP, 0); > + > rdmsrl(MSR_IA32_SYSENTER_EIP, tmpl); > vmcs_writel(HOST_IA32_SYSENTER_EIP, tmpl); /* 22.2.3 */ > Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> -- Vitaly