On Mon, Apr 20, 2020 at 9:41 PM Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > On Mon, Apr 20, 2020 at 10:18:42AM -0700, Jim Mattson wrote: > > On Fri, Apr 17, 2020 at 9:21 PM Sean Christopherson > > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > > > On Wed, Apr 15, 2020 at 04:33:31PM -0700, Jim Mattson wrote: > > > > On Tue, Apr 14, 2020 at 5:12 PM Sean Christopherson > > > > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > > > > > > > On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote: > > > > Yes, it's wrong in the abstract, but with respect to faults and the > > > > VMX-preemption timer expiration, is there any way for either L1 or L2 > > > > to *know* that the virtual CPU has done something wrong? > > > > > > I don't think so? But how is that relevant, i.e. if we can fix KVM instead > > > of fudging the result, why wouldn't we fix KVM? > > > > I'm not sure that I can fix KVM. The missing #DB traps were relatively > > straightforward, but as for the rest of this mess... > > > > Since you seem to have a handle on what needs to be done, I will defer to > > you. > > I wouldn't go so far as to say I have a handle on it, more like I have an > idea of how to fix one part of the overall problem with a generic "rule" > change that also happens to (hopefully) resolve the #DB+MTF issue. > > Anyways, I'll send a patch. Worst case scenario it fails miserably and we > go with this patch :-) > > > > > Isn't it generally true that if you have an exception queued when you > > > > transition from L2 to L1, then you've done something wrong? I wonder > > > > if the call to kvm_clear_exception_queue() in prepare_vmcs12() just > > > > serves to sweep a whole collection of problems under the rug. > > > > > > More than likely, yes. > > > > > > > > In general, interception of an event doesn't change the priority of events, > > > > > e.g. INTR shouldn't get priority over NMI just because if L1 wants to > > > > > intercept INTR but not NMI. > > > > > > > > Yes, but that's a different problem altogether. > > > > > > But isn't the fix the same? Stop processing events if a higher priority > > > event is pending, regardless of whether the event exits to L1. > > > > That depends on how you see the scope of the problem. One could argue > > that the fix for everything that is wrong with KVM is actually the > > same: properly emulate the physical CPU. > > Heh, there is that. > > What I'm arguing is that we shouldn't throw in a workaround knowing that > it's papering over the underlying issue. Preserving event priority > irrespective of VM-Exit behavior is different, in that while it may not > resolve all issues that are being masked by kvm_clear_exception_queue(), > the change itself is correct when viewed in a vacuum. The more I look at that call to kvm_clear_exception_queue(), the more convinced I am that it's wrong. The comment above it says: /* * Drop what we picked up for L2 via vmx_complete_interrupts. It is * preserved above and would only end up incorrectly in L1. */ The first sentence is just wrong. Vmx_complete_interrupts may not be where the NMI/exception/interrupt came from. And the second sentence is not entirely true. Only *injected* events are "preserved above" (by the call to vmcs12_save_pending_event). However, kvm_clear_exception_queue zaps both injected events and pending events. Moreover, vmcs12_save_pending_event "preserves" the event by stashing it in the IDT-vectoring info field of vmcs12, even when the current VM-exit (from L2 to L1) did not (and in some cases cannot) occur during event delivery (e.g. VMX-preemption timer expired).