On Wed, Feb 08, 2023, Santosh Shukla wrote: > On 2/1/2023 3:58 AM, Sean Christopherson wrote: > > On Tue, Nov 29, 2022, Maxim Levitsky wrote: > >> @@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, > >> > >> vcpu->arch.nmi_injected = events->nmi.injected; > >> if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) > >> - vcpu->arch.nmi_pending = events->nmi.pending; > >> + atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued); > >> + > >> static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked); > >> > >> + process_nmi(vcpu); > > > > Argh, having two process_nmi() calls is ugly (not blaming your code, it's KVM's > > ABI that's ugly). E.g. if we collapse this down, it becomes: > > > > process_nmi(vcpu); > > > > if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) { > > <blah blah blah> > > } > > static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked); > > > > process_nmi(vcpu); > > > > And the second mess is that V_NMI needs to be cleared. > > > > Can you please elaborate on "V_NMI cleared" scenario? Are you mentioning > about V_NMI_MASK or svm->nmi_masked? V_NMI_MASK. KVM needs to purge any pending virtual NMIs when userspace sets vCPU event state and KVM_VCPUEVENT_VALID_NMI_PENDING is set. > > The first process_nmi() effectively exists to (a) purge nmi_queued and (b) keep > > nmi_pending if KVM_VCPUEVENT_VALID_NMI_PENDING is not set. I think we can just > > replace that with an set of nmi_queued, i.e. > > > > if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) { > > vcpu->arch-nmi_pending = 0; > > atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending); > > process_nmi(); > > > You mean replace above process_nmi() with kvm_make_request(KVM_REQ_NMI, vcpu), right? > I'll try with above proposal. Yep, if that works. Actually, that might be a requirement. There's a static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked); lurking below this. Invoking process_nmi() before NMI blocking is updated could result in KVM incorrectly dropping/keeping NMIs. I don't think it would be a problem in practice since KVM save only one NMI, but userspace could stuff NMIs. Huh. The the existing code is buggy. events->nmi.pending is a u8, and arch.nmi_pending is an unsigned int. KVM doesn't cap the incoming value, so userspace could set up to 255 pending NMIs. The extra weird part is that the extra NMIs will get dropped the next time KVM stumbles through process_nmi(). Amusingly, KVM only saves one pending NMI, i.e. in a true migration scenario KVM may drop an NMI. events->nmi.pending = vcpu->arch.nmi_pending != 0; The really amusing part is that that code was added by 7460fb4a3400 ("KVM: Fix simultaneous NMIs"). The only thing I can figure is that KVM_GET_VCPU_EVENTS was somewhat blindly updated without much thought about what should actually happen. So, can you slide the below in early in the series? Then in this series, convert to the above suggested flow (zero nmi_pending, stuff nmi_queued) in another patch? From: Sean Christopherson <seanjc@xxxxxxxxxx> Date: Wed, 8 Feb 2023 07:44:16 -0800 Subject: [PATCH] KVM: x86: Save/restore all NMIs when multiple NMIs are pending Save all pending NMIs in KVM_GET_VCPU_EVENTS, and queue KVM_REQ_NMI if one or more NMIs are pending after KVM_SET_VCPU_EVENTS in order to re-evaluate pending NMIs with respect to NMI blocking. KVM allows multiple NMIs to be pending in order to faithfully emulate bare metal handling of simultaneous NMIs (on bare metal, truly simultaneous NMIs are impossible, i.e. one will always arrive first and be consumed). Support for simultaneous NMIs botched the save/restore though. KVM only saves one pending NMI, but allows userspace to restore 255 pending NMIs as kvm_vcpu_events.nmi.pending is a u8, and KVM's internal state is stored in an unsigned int. 7460fb4a3400 ("KVM: Fix simultaneous NMIs") Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/x86.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 508074e47bc0..e9339acbf82a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5115,7 +5115,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, events->interrupt.shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu); events->nmi.injected = vcpu->arch.nmi_injected; - events->nmi.pending = vcpu->arch.nmi_pending != 0; + events->nmi.pending = vcpu->arch.nmi_pending; events->nmi.masked = static_call(kvm_x86_get_nmi_mask)(vcpu); /* events->sipi_vector is never valid when reporting to user space */ @@ -5202,8 +5202,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, events->interrupt.shadow); vcpu->arch.nmi_injected = events->nmi.injected; - if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) + if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) { vcpu->arch.nmi_pending = events->nmi.pending; + if (vcpu->arch.nmi_pending) + kvm_make_request(KVM_REQ_NMI, vcpu); + } static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked); if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR && base-commit: 6c77ae716d546d71b21f0c9ee7d405314a3f3f9e --