Re: [PATCH 09/13] KVM: SVM: allow NMI window with vNMI

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Mon, 21 Nov 2022 15:40:59 +0200

On Thu, 2022-11-17 at 18:21 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > When the vNMI is enabled, the only case when the KVM will use an NMI
> > window is when the vNMI injection is pending.
> > 
> > In this case on next IRET/RSM/STGI, the injection has to be complete
> > and a new NMI can be injected.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> > ---
> >  arch/x86/kvm/svm/svm.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index cfec4c98bb589b..eaa30f8ace518d 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2477,7 +2477,10 @@ static int iret_interception(struct kvm_vcpu *vcpu)
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >  
> >         ++vcpu->stat.nmi_window_exits;
> > -       vcpu->arch.hflags |= HF_IRET_MASK;
> > +
> > +       if (!is_vnmi_enabled(svm))
> > +               vcpu->arch.hflags |= HF_IRET_MASK;
> 
> Ugh, HF_IRET_MASK is such a terrible name/flag.  Given that it lives with GIF
> and NMI, one would naturally think that it means "IRET is intercepted", but it
> really means "KVM just intercepted an IRET and is waiting for NMIs to become
> unblocked".
> 
> And on a related topic, why on earth are GIF, NMI, and IRET tracked in hflags?
> They are 100% SVM concepts.  IMO, this code would be much easier to follow if
> by making them bools in vcpu_svm with more descriptive names.
> 
> > +
> >         if (!sev_es_guest(vcpu->kvm)) {
> >                 svm_clr_intercept(svm, INTERCEPT_IRET);
> >                 svm->nmi_iret_rip = kvm_rip_read(vcpu);
> 
> The vNMI interaction with this logic is confusing, as nmi_iret_rip doesn't need
> to be captured for the vNMI case.  SEV-ES actually has unrelated reasons for not
> reading RIP vs. not intercepting IRET, they just got bundled together here for
> convenience.
Yes, this can be cleaned up, again I didn't want to change too much of the code.

> 
> This is also an opportunity to clean up the SEV-ES interaction with IRET interception,
> which is splattered all over the place and isn't documented anywhere.
> 
> E.g. (with an HF_IRET_MASK => awaiting_iret_completion change)
> 
> /*
>  * For SEV-ES guests, KVM must not rely on IRET to detect NMI unblocking as
>  * #VC->IRET in the guest will result in KVM thinking NMIs are unblocked before
>  * the guest is ready for a new NMI.  Architecturally, KVM is 100% correct to
>  * treat NMIs as unblocked on IRET, but the guest-host ABI for SEV-ES guests is
>  * that KVM must wait for an explicit "NMI Complete" from the guest.
>  */
> static void svm_disable_iret_interception(struct vcpu_svm *svm)
> {
>         if (!sev_es_guest(svm->vcpu.kvm))
>                 svm_clr_intercept(svm, INTERCEPT_IRET);
> }
> 
> static void svm_enable_iret_interception(struct vcpu_svm *svm)
> {
>         if (!sev_es_guest(svm->vcpu.kvm))
>                 svm_set_intercept(svm, INTERCEPT_IRET);
> }

This makes sense, but doesn't have to be done in this patch series IMHO.
> 
> static int iret_interception(struct kvm_vcpu *vcpu)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
> 
>         ++vcpu->stat.nmi_window_exits;
> 
>         /*
>          * No need to wait for the IRET to complete if vNMIs are enabled as
>          * hardware will automatically process the pending NMI when NMIs are
>          * unblocked from the guest's perspective.
>          */
>         if (!is_vnmi_enabled(svm)) {
>                 svm->awaiting_iret_completion = true;
> 
>                 /*
>                  * The guest's RIP is inaccessible for SEV-ES guests, just
>                  * assume forward progress was made on the next VM-Exit.
>                  */
>                 if (!sev_es_guest(vcpu->kvm))
>                         svm->nmi_iret_rip = kvm_rip_read(vcpu);
>         }
> 
>         svm_disable_iret_interception(svm);
> 
>         kvm_make_request(KVM_REQ_EVENT, vcpu);
>         return 1;
> }

> > @@ -3735,9 +3738,6 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> >  {
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >  
> > -       if (is_vnmi_enabled(svm))
> > -               return;
> > -
> >         if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
> >                 return; /* IRET will cause a vm exit */
> 
> As much as I like incremental patches, in this case I'm having a hell of a time
> reviewing the code as the vNMI logic ends up being split across four patches.
> E.g. in this particular case, the above requires knowing that svm_inject_nmi()
> never sets HF_NMI_MASK when vNMI is enabled.
> 
> In the next version, any objection to squashing patches 7-10 into a single "Add
> non-nested vNMI support" patch?

No objection at all - again since this is not my patch series, I didn't want
to make too many invasive changes to it.

> 
> As for this code, IMO some pre-work to change the flow would help with the vNMI
> case.  The GIF=0 logic overrides legacy NMI blocking, and so can be handled first.
> And I vote to explicitly set INTERCEPT_IRET in the above case instead of relying
> on INTERCEPT_IRET to already be set by svm_inject_nmi().
> 
> That would yield this as a final result:
> 
> static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
> 
>         /*
>          * GIF=0 blocks NMIs irrespective of legacy NMI blocking.  No need to
>          * intercept or single-step IRET if GIF=0, just intercept STGI.
>          */
>         if (!gif_set(svm)) {
>                 if (vgif)
>                         svm_set_intercept(svm, INTERCEPT_STGI);
>                 return;
>         }
> 
>         /*
>          * NMI is blocked, either because an NMI is in service or because KVM
>          * just injected an NMI.  If KVM is waiting for an intercepted IRET to
>          * complete, single-step the IRET to wait for NMIs to become unblocked.
>          * Otherwise, intercept the guest's next IRET.
>          */
>         if (svm->awaiting_iret_completion) {
>                 svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
>                 svm->nmi_singlestep = true;
>                 svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
>         } else {
>                 svm_set_intercept(svm, INTERCEPT_IRET);
>         }
> }
> 
> >  
> > @@ -3751,9 +3751,14 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> >          * Something prevents NMI from been injected. Single step over possible
> >          * problem (IRET or exception injection or interrupt shadow)
> >          */
> > -       svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> > -       svm->nmi_singlestep = true;
> > -       svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> > +
> > +       if (is_vnmi_enabled(svm)) {
> > +               svm_set_intercept(svm, INTERCEPT_IRET);
> 
> This will break SEV-ES.  Per commit 4444dfe4050b ("KVM: SVM: Add NMI support for
> an SEV-ES guest"), the hypervisor must not rely on IRET interception to detect
> NMI unblocking for SEV-ES guests.  As above, I think we should provide helpers to
> toggle NMI interception to reduce the probability of breaking SEV-ES.
Yes, one more reason for the helpers, I didn't notice that I missed that 'if'.

> 
> > +       } else {
> > +               svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> > +               svm->nmi_singlestep = true;
> > +               svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> > +       }
> >  }
> >  
> >  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> > -- 
> > 2.34.3
> > 
> 
Best regards,
	Maxim Levitsky