Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer

Jim Mattson <jmattson@xxxxxxxxxx> · Tue, 21 Apr 2020 11:28:23 -0700

On Mon, Apr 20, 2020 at 9:41 PM Sean Christopherson
<sean.j.christopherson@xxxxxxxxx> wrote:
>
> On Mon, Apr 20, 2020 at 10:18:42AM -0700, Jim Mattson wrote:
> > On Fri, Apr 17, 2020 at 9:21 PM Sean Christopherson
> > <sean.j.christopherson@xxxxxxxxx> wrote:
> > >
> > > On Wed, Apr 15, 2020 at 04:33:31PM -0700, Jim Mattson wrote:
> > > > On Tue, Apr 14, 2020 at 5:12 PM Sean Christopherson
> > > > <sean.j.christopherson@xxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> > > > Yes, it's wrong in the abstract, but with respect to faults and the
> > > > VMX-preemption timer expiration, is there any way for either L1 or L2
> > > > to *know* that the virtual CPU has done something wrong?
> > >
> > > I don't think so?  But how is that relevant, i.e. if we can fix KVM instead
> > > of fudging the result, why wouldn't we fix KVM?
> >
> > I'm not sure that I can fix KVM. The missing #DB traps were relatively
> > straightforward, but as for the rest of this mess...
> >
> > Since you seem to have a handle on what needs to be done, I will defer to
> > you.
>
> I wouldn't go so far as to say I have a handle on it, more like I have an
> idea of how to fix one part of the overall problem with a generic "rule"
> change that also happens to (hopefully) resolve the #DB+MTF issue.
>
> Anyways, I'll send a patch.  Worst case scenario it fails miserably and we
> go with this patch :-)
>
> > > > Isn't it generally true that if you have an exception queued when you
> > > > transition from L2 to L1, then you've done something wrong? I wonder
> > > > if the call to kvm_clear_exception_queue() in prepare_vmcs12() just
> > > > serves to sweep a whole collection of problems under the rug.
> > >
> > > More than likely, yes.
> > >
> > > > > In general, interception of an event doesn't change the priority of events,
> > > > > e.g. INTR shouldn't get priority over NMI just because if L1 wants to
> > > > > intercept INTR but not NMI.
> > > >
> > > > Yes, but that's a different problem altogether.
> > >
> > > But isn't the fix the same?  Stop processing events if a higher priority
> > > event is pending, regardless of whether the event exits to L1.
> >
> > That depends on how you see the scope of the problem. One could argue
> > that the fix for everything that is wrong with KVM is actually the
> > same: properly emulate the physical CPU.
>
> Heh, there is that.
>
> What I'm arguing is that we shouldn't throw in a workaround knowing that
> it's papering over the underlying issue.  Preserving event priority
> irrespective of VM-Exit behavior is different, in that while it may not
> resolve all issues that are being masked by kvm_clear_exception_queue(),
> the change itself is correct when viewed in a vacuum.

The more I look at that call to kvm_clear_exception_queue(), the more
convinced I am that it's wrong. The comment above it says:

/*
* Drop what we picked up for L2 via vmx_complete_interrupts. It is
* preserved above and would only end up incorrectly in L1.
*/

The first sentence is just wrong. Vmx_complete_interrupts may not be
where the NMI/exception/interrupt came from. And the second sentence
is not entirely true. Only *injected* events are "preserved above" (by
the call to vmcs12_save_pending_event). However,
kvm_clear_exception_queue zaps both injected events and pending
events. Moreover, vmcs12_save_pending_event "preserves" the event by
stashing it in the IDT-vectoring info field of vmcs12, even when the
current VM-exit (from L2 to L1) did not (and in some cases cannot)
occur during event delivery (e.g. VMX-preemption timer expired).