Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Fri, 17 Apr 2020 21:21:08 -0700

On Wed, Apr 15, 2020 at 04:33:31PM -0700, Jim Mattson wrote:
> On Tue, Apr 14, 2020 at 5:12 PM Sean Christopherson
> <sean.j.christopherson@xxxxxxxxx> wrote:
> >
> > On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> > > Regarding -EBUSY, I'm in complete agreement. However, I'm not sure
> > > what the potential confusion is regarding the event. Are you
> > > suggesting that one might think that we have a #DB to deliver to L1
> > > while we're in guest mode? IIRC, that can happen under SVM, but I
> > > don't believe it can happen under VMX.
> >
> > The potential confusion is that vcpu->arch.exception.pending was already
> > checked, twice.  It makes one wonder why it needs to be checked a third
> > time.  And actually, I think that's probably a good indicator that singling
> > out single-step #DB isn't the correct fix, it just happens to be the only
> > case that's been encountered thus far, e.g. a #PF when fetching the instr
> > for emulation should also get priority over the preemption timer.  On real
> > hardware, expiration of the preemption timer while vectoring a #PF wouldn't
> > wouldn't get recognized until the next instruction boundary, i.e. at the
> > start of the first instruction of the #PF handler.  Dropping the #PF isn't
> > a problem in most cases, because unlike the single-step #DB, it will be
> > re-encountered when L1 resumes L2.  But, dropping the #PF is still wrong.
> 
> Yes, it's wrong in the abstract, but with respect to faults and the
> VMX-preemption timer expiration, is there any way for either L1 or L2
> to *know* that the virtual CPU has done something wrong?

I don't think so?  But how is that relevant, i.e. if we can fix KVM instead
of fudging the result, why wouldn't we fix KVM?

> Isn't it generally true that if you have an exception queued when you
> transition from L2 to L1, then you've done something wrong? I wonder
> if the call to kvm_clear_exception_queue() in prepare_vmcs12() just
> serves to sweep a whole collection of problems under the rug.

More than likely, yes.

> > In general, interception of an event doesn't change the priority of events,
> > e.g. INTR shouldn't get priority over NMI just because if L1 wants to
> > intercept INTR but not NMI.
> 
> Yes, but that's a different problem altogether.

But isn't the fix the same?  Stop processing events if a higher priority
event is pending, regardless of whether the event exits to L1.

> > TL;DR: I think the fix should instead be:
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index c868c64770e0..042d7a9037be 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3724,9 +3724,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >         /*
> >          * Process any exceptions that are not debug traps before MTF.
> >          */
> > -       if (vcpu->arch.exception.pending &&
> > -           !vmx_pending_dbg_trap(vcpu) &&
> > -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> > +       if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu))
> > +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> > @@ -3741,8 +3742,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >                 return 0;
> >         }
> >
> > -       if (vcpu->arch.exception.pending &&
> > -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> > +       if (vcpu->arch.exception.pending) {
> > +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> > @@ -3757,7 +3760,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >                 return 0;
> >         }
> >
> > -       if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> > +       if (vcpu->arch.nmi_pending) {
> > +               if (!nested_exit_on_nmi(vcpu))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
> > @@ -3772,7 +3778,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >                 return 0;
> >         }
> >
> > -       if (kvm_cpu_has_interrupt(vcpu) && nested_exit_on_intr(vcpu)) {
> > +       if (kvm_cpu_has_interrupt(vcpu) {
> > +               if (!nested_exit_on_intr(vcpu))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
> >