Add CC: Andy Lutomirski Add CC: Steven Rostedt I think this patch made it wrong for NMI. On Wed, Sep 16, 2020 at 3:27 AM Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > Rework NMI VM-Exit handling to invoke the kernel handler by function > call instead of INTn. INTn microcode is relatively expensive, and > aligning the IRQ and NMI handling will make it easier to update KVM > should some newfangled method for invoking the handlers come along. > > Suggested-by: Andi Kleen <ak@xxxxxxxxxxxxxxx> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > --- > arch/x86/kvm/vmx/vmx.c | 30 +++++++++++++++--------------- > 1 file changed, 15 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 391f079d9136..b0eca151931d 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -6411,40 +6411,40 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu) > > void vmx_do_interrupt_nmi_irqoff(unsigned long entry); > > +static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info) > +{ > + unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK; > + gate_desc *desc = (gate_desc *)host_idt_base + vector; > + > + kvm_before_interrupt(vcpu); > + vmx_do_interrupt_nmi_irqoff(gate_offset(desc)); > + kvm_after_interrupt(vcpu); > +} > + > static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) > { > u32 intr_info = vmx_get_intr_info(&vmx->vcpu); > > /* if exit due to PF check for async PF */ > - if (is_page_fault(intr_info)) { > + if (is_page_fault(intr_info)) > vmx->vcpu.arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags(); > /* Handle machine checks before interrupts are enabled */ > - } else if (is_machine_check(intr_info)) { > + else if (is_machine_check(intr_info)) > kvm_machine_check(); > /* We need to handle NMIs before interrupts are enabled */ > - } else if (is_nmi(intr_info)) { > - kvm_before_interrupt(&vmx->vcpu); > - asm("int $2"); > - kvm_after_interrupt(&vmx->vcpu); > - } > + else if (is_nmi(intr_info)) > + handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info); > } When handle_interrupt_nmi_irqoff() is called, we may lose the CPU-hidden-NMI-masked state due to IRET of #DB, #BP or other traps between VMEXIT and handle_interrupt_nmi_irqoff(). But the NMI handler in the Linux kernel *expects* the CPU-hidden-NMI-masked state is still set in the CPU for no nested NMI intruding into the beginning of the handler. The original code "int $2" can provide the needed CPU-hidden-NMI-masked when entering #NMI, but I doubt it about this change. I maybe missed something, especially I haven't read all of the earlier discussions about the change. More importantly, I haven't found the original suggestion from Andi Kleen: (Quote from the cover letter): The NMI consolidation was loosely suggested by Andi Kleen. Andi's actual suggestion was to export and directly call the NMI handler, but that's a more involved change (unless I'm misunderstanding the wants of the NMI handler), whereas piggybacking the IRQ code is simple and seems like a worthwhile intermediate step. (End of quote) I think we need to change it back or change it to call the NMI handler immediately after VMEXIT before leaving "nostr" section if needed. Thanks, Lai