On Thu, Aug 24, 2023, Like Xu wrote: > On 13/12/2022 2:09 pm, Sean Christopherson wrote: > > Move VMX's handling of NMI VM-Exits into vmx_vcpu_enter_exit() so that > > the NMI is handled prior to leaving the safety of noinstr. Handling the > > NMI after leaving noinstr exposes the kernel to potential ordering > > problems as an instrumentation-induced fault, e.g. #DB, #BP, #PF, etc. > > will unblock NMIs when IRETing back to the faulting instruction. > (3) In addition, trace_kvm_exit() should ideally appear before the host NMI > trace logs, which makes it easier to understand. Ideally, yes, but tracepoints are not remotely noinstr friendly. > A proposal fix is to delay vmx_do_nmi_irqoff() a little bit, but not a revert move: > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index e6849f780dba..1f29b7f22da7 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7230,13 +7230,6 @@ static noinstr void vmx_vcpu_enter_exit(struct > kvm_vcpu *vcpu, > else > vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON); > > - if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && > - is_nmi(vmx_get_intr_info(vcpu))) { > - kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); > - vmx_do_nmi_irqoff(); > - kvm_after_interrupt(vcpu); > - } > - > guest_state_exit_irqoff(); > } > > @@ -7389,6 +7382,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) > > trace_kvm_exit(vcpu, KVM_ISA_VMX); > > + if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && > + is_nmi(vmx_get_intr_info(vcpu))) { > + kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); > + vmx_do_nmi_irqoff(); > + kvm_after_interrupt(vcpu); > + } No, the whole point of doing NMI handling in vmx_vcpu_enter_exit() is so that NMIs are serviced before instrumentation is enabled. I think the below is sufficient (untested at this point). Not quite minimal, e.g. I'm pretty sure there's (currently) no need to snapshot IDT_VECTORING_INFO_FIELD so early, but I can't think of any reason to wait. -- From: Sean Christopherson <seanjc@xxxxxxxxxx> Date: Thu, 24 Aug 2023 06:49:36 -0700 Subject: [PATCH] KVM: VMX: Refresh available regs and IDT vectoring info before NMI handling Reset the mask of available "registers" and refresh the IDT vectoring info snapshot in vmx_vcpu_enter_exit(), before KVM potentially handles a an NMI VM-Exit. One of the "registers" that KVM VMX lazily loads is the vmcs.VM_EXIT_INTR_INFO field, which is holds the vector+type on "exception or NMI" VM-Exits, i.e. is needed to identify NMIs. Clearing the available registers bitmask after handling NMIs results in KVM querying info from the last VM-Exit that read vmcs.VM_EXIT_INTR_INFO, and leads to both missed NMIs and spurious NMIs from the guest's perspective. Opportunistically grab vmcs.IDT_VECTORING_INFO_FIELD early in the VM-Exit path too, e.g. to guard against similar consumption of stale data. The field is read on every "normal" VM-Exit, and there's no point in delaying the inevitable. Reported-by: Like Xu <like.xu.linux@xxxxxxxxx> Fixes: 11df586d774f ("KVM: VMX: Handle NMI VM-Exits in noinstr region") Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/vmx/vmx.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e6849f780dba..d2b78ab7a9f2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7222,13 +7222,20 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, flags); vcpu->arch.cr2 = native_read_cr2(); + vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET; + + vmx->idt_vectoring_info = 0; vmx_enable_fb_clear(vmx); - if (unlikely(vmx->fail)) + if (unlikely(vmx->fail)) { vmx->exit_reason.full = 0xdead; - else - vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON); + goto out; + } + + vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON); + if (likely(!vmx->exit_reason.failed_vmentry)) + vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && is_nmi(vmx_get_intr_info(vcpu))) { @@ -7237,6 +7244,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, kvm_after_interrupt(vcpu); } +out: guest_state_exit_irqoff(); } @@ -7358,8 +7366,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) loadsegment(es, __USER_DS); #endif - vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET; - pt_guest_exit(vmx); kvm_load_host_xsave_state(vcpu); @@ -7376,17 +7382,12 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) vmx->nested.nested_run_pending = 0; } - vmx->idt_vectoring_info = 0; - if (unlikely(vmx->fail)) return EXIT_FASTPATH_NONE; if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY)) kvm_machine_check(); - if (likely(!vmx->exit_reason.failed_vmentry)) - vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); - trace_kvm_exit(vcpu, KVM_ISA_VMX); if (unlikely(vmx->exit_reason.failed_vmentry)) base-commit: fff2e47e6c3b8050ca26656693caa857e3a8b740 --