Re: Question about AMD SVM's virtual NMI support in Linux kernel mainline

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 24 Aug 2023 14:22:48 +0000

+kvm and lkml, didn't realize they weren't Cc'd in the original mail.

On Thu, Aug 24, 2023, Santosh Shukla wrote:
> Hi Sean,
> 
> On 8/23/2023 8:33 PM, Sean Christopherson wrote:
> > On Fri, Aug 18, 2023, 陈培鸿(乘鸿) wrote:
> >> According to the results, I found that in the case of concurrent NMIs, some
> >> NMIs are still injected through eventinj instead of vNMI.
> > 
> > Key word is "some", having two NMIs arrive "simultaneously" is uncommon.  In quotes
> > because if a vCPU is scheduled out or otherwise delayed, two NMIs can be recognized
> > by KVM at the same time even if there was a sizeable delay between when they were
> > sent.
> > 
> >> Based on the above explanations, I summarize my questions as follows:
> >> 1. According to the specification of AMD SVM vNMI, with vNMI enabled, will
> >> some NMIs be injected through eventinj under the condition of concurrent
> >> NMIs?
> > 
> > Yes.
> > 
> >> 2. If yes, what is the role of vNMI? Is it just an addition to eventinj? What
> >> benefits is it designed to expect? Is there any benchmark data support?
> > 
> > Manually injecting NMIs isn't problematic from a performance perspective.  KVM
> > takes control of the vCPU, i.e. forces a VM-Exit, to pend a virtual NMI, so there's
> > no extra VM-Exit.
> > 
> > The value added by vNMI support is that KVM doesn't need to manually track/detect
> > when NMIs become unblocked in the guest.  SVM doesn't provide a hardware-supported
> > NMI-window exiting, so KVM has to intercept _and_ single-step IRET, which adds two
> > VM-Exits for _every_ NMI when vNMI isn't available (and it's a complete mess for
> > things like SEV-ES).
> > 
> >> 3. If not, does it mean that the current SVM's vNMI support scheme in the
> >> Linux mainline code is flawed? How should it be fixed?
> > 
> > The approach as a whole isn't flawed, it's the best KVM can do given SVM's vNMI
> > architecture and KVM's ABI with respect to "concurrent" NMIs.
> > 
> > Hrm, though there does appear to be a bug in the injecting path.  KVM doesn't
> > manually set V_NMI_BLOCKING_MASK, and will unnecessarily enable IRET interception
> > when manually injecting a vNMI.  Intercepting IRET should be unnecessary because
> > hardware will automatically accept the pending vNMI when NMIs become unblocked.
> > And I don't see anything in the APM that suggests hardware will set V_NMI_BLOCKING_MASK
> > when software directly injects an NMI.
> > 
> > So I think we need:
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index d381ad424554..c956a9f500a2 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3476,6 +3476,11 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
> >         if (svm->nmi_l1_to_l2)
> >                 return;
> >  
> > +       if (is_vnmi_enabled(svm)) {
> > +               svm->vmcb->control.int_ctl |= V_NMI_BLOCKING_MASK;
> > +               return;
> > +       }
> > +
> >         svm->nmi_masked = true;
> >         svm_set_iret_intercept(svm);
> >         ++vcpu->stat.nmi_injections;
> > --
> > 
> > or if hardware does set V_NMI_BLOCKING_MASK in this case, just:
> > 
> 
> Yes, HW does set BLOCKING_MASK when HW takes the pending vNMI event.

I'm not asking about the pending vNMI case, which is clearly spelled out in the
APM.  I'm asking about directly injecting an NMI via:

	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;

> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index d381ad424554..201a1a33ecd2 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3473,7 +3473,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
> >  
> >         svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
> >  
> > -       if (svm->nmi_l1_to_l2)
> > +       if (svm->nmi_l1_to_l2 || is_vnmi_enabled(svm))
> >                 return;
> >  
> >         svm->nmi_masked = true;
> > --
> >
> 
> Above proposal make sense to me, I was reviewing source code flow
> for a scenarios when two consecutive need to me delivered to Guest.
> Example, process_nmi will pend the first NMI and then second NMI will
> be injected through EVTINJ, as because (kvm_x86_inject_nmi)
> will get called and that will set the _iret_intercept. 
> 
> With your proposal inject_nmi will be set the env_inj NMI w/o the IRET,
> I think we could check for "is_vnmi_enabled" before the programming
> the "evt_inj"?

No, because the whole point of this path is to directly inject an NMI when NMIs
are NOT blocked in the guest AND there is already a pending vNMI.