Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer

Jim Mattson <jmattson@xxxxxxxxxx> · Wed, 15 Apr 2020 16:33:31 -0700

On Tue, Apr 14, 2020 at 5:12 PM Sean Christopherson
<sean.j.christopherson@xxxxxxxxx> wrote:
>
> On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> > On Mon, Apr 13, 2020 at 8:17 PM Sean Christopherson
> > <sean.j.christopherson@xxxxxxxxx> wrote:
> > >
> > > On Mon, Apr 13, 2020 at 05:09:46PM -0700, Jim Mattson wrote:
> > > > Previously, if the hrtimer for the nested VMX-preemption timer fired
> > > > while L0 was emulating an L2 instruction with RFLAGS.TF set, the
> > > > synthesized single-step trap would be unceremoniously dropped when
> > > > synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.
> > > >
> > > > To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
> > > > from L2 to L1 when there is a pending debug trap, such as a
> > > > single-step trap.
> > > >
> > > > Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
> > > > Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx>
> > > > Reviewed-by: Oliver Upton <oupton@xxxxxxxxxx>
> > > > Reviewed-by: Peter Shier <pshier@xxxxxxxxxx>
> > > > ---
> > > >  arch/x86/kvm/vmx/nested.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > > index cbc9ea2de28f..6ab974debd44 100644
> > > > --- a/arch/x86/kvm/vmx/nested.c
> > > > +++ b/arch/x86/kvm/vmx/nested.c
> > > > @@ -3690,7 +3690,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> > > >           vmx->nested.preemption_timer_expired) {
> > > >               if (block_nested_events)
> > > >                       return -EBUSY;
> > > > -             nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
> > > > +             if (!vmx_pending_dbg_trap(vcpu))
> > >
> > > IMO this one warrants a comment.  It's not immediately obvious that this
> > > only applies to #DBs that are being injected into L2, and that returning
> > > -EBUSY will do the wrong thing.
> >
> > Regarding -EBUSY, I'm in complete agreement. However, I'm not sure
> > what the potential confusion is regarding the event. Are you
> > suggesting that one might think that we have a #DB to deliver to L1
> > while we're in guest mode? IIRC, that can happen under SVM, but I
> > don't believe it can happen under VMX.
>
> The potential confusion is that vcpu->arch.exception.pending was already
> checked, twice.  It makes one wonder why it needs to be checked a third
> time.  And actually, I think that's probably a good indicator that singling
> out single-step #DB isn't the correct fix, it just happens to be the only
> case that's been encountered thus far, e.g. a #PF when fetching the instr
> for emulation should also get priority over the preemption timer.  On real
> hardware, expiration of the preemption timer while vectoring a #PF wouldn't
> wouldn't get recognized until the next instruction boundary, i.e. at the
> start of the first instruction of the #PF handler.  Dropping the #PF isn't
> a problem in most cases, because unlike the single-step #DB, it will be
> re-encountered when L1 resumes L2.  But, dropping the #PF is still wrong.

Yes, it's wrong in the abstract, but with respect to faults and the
VMX-preemption timer expiration, is there any way for either L1 or L2
to *know* that the virtual CPU has done something wrong?

Isn't it generally true that if you have an exception queued when you
transition from L2 to L1, then you've done something wrong? I wonder
if the call to kvm_clear_exception_queue() in prepare_vmcs12() just
serves to sweep a whole collection of problems under the rug.

> In general, interception of an event doesn't change the priority of events,
> e.g. INTR shouldn't get priority over NMI just because if L1 wants to
> intercept INTR but not NMI.

Yes, but that's a different problem altogether.

> TL;DR: I think the fix should instead be:
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index c868c64770e0..042d7a9037be 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3724,9 +3724,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>         /*
>          * Process any exceptions that are not debug traps before MTF.
>          */
> -       if (vcpu->arch.exception.pending &&
> -           !vmx_pending_dbg_trap(vcpu) &&
> -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> +       if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu))
> +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> @@ -3741,8 +3742,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
>
> -       if (vcpu->arch.exception.pending &&
> -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> +       if (vcpu->arch.exception.pending) {
> +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> @@ -3757,7 +3760,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
>
> -       if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> +       if (vcpu->arch.nmi_pending) {
> +               if (!nested_exit_on_nmi(vcpu))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
> @@ -3772,7 +3778,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
>
> -       if (kvm_cpu_has_interrupt(vcpu) && nested_exit_on_intr(vcpu)) {
> +       if (kvm_cpu_has_interrupt(vcpu) {
> +               if (!nested_exit_on_intr(vcpu))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
>