On 2/8/2023 9:36 PM, Sean Christopherson wrote: > On Wed, Feb 08, 2023, Santosh Shukla wrote: >> On 2/1/2023 3:58 AM, Sean Christopherson wrote: >>> On Tue, Nov 29, 2022, Maxim Levitsky wrote: >>>> @@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, >>>> >>>> vcpu->arch.nmi_injected = events->nmi.injected; >>>> if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) >>>> - vcpu->arch.nmi_pending = events->nmi.pending; >>>> + atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued); >>>> + >>>> static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked); >>>> >>>> + process_nmi(vcpu); >>> >>> Argh, having two process_nmi() calls is ugly (not blaming your code, it's KVM's >>> ABI that's ugly). E.g. if we collapse this down, it becomes: >>> >>> process_nmi(vcpu); >>> >>> if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) { >>> <blah blah blah> >>> } >>> static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked); >>> >>> process_nmi(vcpu); >>> >>> And the second mess is that V_NMI needs to be cleared. >>> >> >> Can you please elaborate on "V_NMI cleared" scenario? Are you mentioning >> about V_NMI_MASK or svm->nmi_masked? > > V_NMI_MASK. KVM needs to purge any pending virtual NMIs when userspace sets > vCPU event state and KVM_VCPUEVENT_VALID_NMI_PENDING is set. > As per the APM: V_NMI_MASK is managed by the processor " V_NMI_MASK: Indicates whether virtual NMIs are masked. The processor will set V_NMI_MASK once it takes the virtual NMI. V_NMI_MASK is cleared when the guest successfully completes an IRET instruction or #VMEXIT occurs while delivering the virtual NMI " In my initial implementation I had changed V_NMI_MASK for the SMM scenario [1], This is also not required as HW will save the V_NMI/V_NMI_MASK on SMM entry and restore them on RSM. That said the svm_{get,set}_nmi_mask will look something like: diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index a9e9bfbffd72..08911a33cf1e 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3635,13 +3635,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection) static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu) { - return to_svm(vcpu)->nmi_masked; + struct vcpu_svm *svm = to_svm(vcpu); + + if (is_vnmi_enabled(svm)) + return svm->vmcb->control.int_ctl & V_NMI_MASK; + else + return svm->nmi_masked; } static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) { struct vcpu_svm *svm = to_svm(vcpu); + if (is_vnmi_enabled(svm)) + return; + if (masked) { svm->nmi_masked = true; svm_set_iret_intercept(svm); is there any inputs on above approach? [1] https://lore.kernel.org/all/20220810061226.1286-4-santosh.shukla@xxxxxxx/ >>> The first process_nmi() effectively exists to (a) purge nmi_queued and (b) keep >>> nmi_pending if KVM_VCPUEVENT_VALID_NMI_PENDING is not set. I think we can just >>> replace that with an set of nmi_queued, i.e. >>> >>> if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) { >>> vcpu->arch-nmi_pending = 0; >>> atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending); >>> process_nmi(); >>> >> You mean replace above process_nmi() with kvm_make_request(KVM_REQ_NMI, vcpu), right? >> I'll try with above proposal. > > Yep, if that works. Actually, that might be a requirement. There's a > > static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked); > > lurking below this. Invoking process_nmi() before NMI blocking is updated could > result in KVM incorrectly dropping/keeping NMIs. I don't think it would be a > problem in practice since KVM save only one NMI, but userspace could stuff NMIs. > > Huh. The the existing code is buggy. events->nmi.pending is a u8, and > arch.nmi_pending is an unsigned int. KVM doesn't cap the incoming value, so > userspace could set up to 255 pending NMIs. The extra weird part is that the extra > NMIs will get dropped the next time KVM stumbles through process_nmi(). > > Amusingly, KVM only saves one pending NMI, i.e. in a true migration scenario KVM > may drop an NMI. > > events->nmi.pending = vcpu->arch.nmi_pending != 0; > > The really amusing part is that that code was added by 7460fb4a3400 ("KVM: Fix > simultaneous NMIs"). The only thing I can figure is that KVM_GET_VCPU_EVENTS was > somewhat blindly updated without much thought about what should actually happen. > > So, can you slide the below in early in the series? Then in this series, convert > to the above suggested flow (zero nmi_pending, stuff nmi_queued) in another patch? > > From: Sean Christopherson <seanjc@xxxxxxxxxx> > Date: Wed, 8 Feb 2023 07:44:16 -0800 > Subject: [PATCH] KVM: x86: Save/restore all NMIs when multiple NMIs are > pending > > Save all pending NMIs in KVM_GET_VCPU_EVENTS, and queue KVM_REQ_NMI if one > or more NMIs are pending after KVM_SET_VCPU_EVENTS in order to re-evaluate > pending NMIs with respect to NMI blocking. > > KVM allows multiple NMIs to be pending in order to faithfully emulate bare > metal handling of simultaneous NMIs (on bare metal, truly simultaneous > NMIs are impossible, i.e. one will always arrive first and be consumed). > Support for simultaneous NMIs botched the save/restore though. KVM only > saves one pending NMI, but allows userspace to restore 255 pending NMIs > as kvm_vcpu_events.nmi.pending is a u8, and KVM's internal state is stored > in an unsigned int. > > 7460fb4a3400 ("KVM: Fix simultaneous NMIs") > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/x86.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 508074e47bc0..e9339acbf82a 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5115,7 +5115,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, > events->interrupt.shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu); > > events->nmi.injected = vcpu->arch.nmi_injected; > - events->nmi.pending = vcpu->arch.nmi_pending != 0; > + events->nmi.pending = vcpu->arch.nmi_pending; > events->nmi.masked = static_call(kvm_x86_get_nmi_mask)(vcpu); > > /* events->sipi_vector is never valid when reporting to user space */ > @@ -5202,8 +5202,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, > events->interrupt.shadow); > > vcpu->arch.nmi_injected = events->nmi.injected; > - if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) > + if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) { > vcpu->arch.nmi_pending = events->nmi.pending; > + if (vcpu->arch.nmi_pending) > + kvm_make_request(KVM_REQ_NMI, vcpu); > + } > static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked); > > if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR && > Ok. On top of the above, I am including your suggested change as below... diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e0855599df65..437a6cea3bc7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5201,9 +5201,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, vcpu->arch.nmi_injected = events->nmi.injected; if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) { - vcpu->arch.nmi_pending = events->nmi.pending; - if (vcpu->arch.nmi_pending) - kvm_make_request(KVM_REQ_NMI, vcpu); + vcpu->arch.nmi_pending = 0; + atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending); + kvm_make_request(KVM_REQ_NMI, vcpu); } static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked); does that make sense? > base-commit: 6c77ae716d546d71b21f0c9ee7d405314a3f3f9e