On Mon, Apr 20, 2020 at 10:18:42AM -0700, Jim Mattson wrote: > On Fri, Apr 17, 2020 at 9:21 PM Sean Christopherson > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > On Wed, Apr 15, 2020 at 04:33:31PM -0700, Jim Mattson wrote: > > > On Tue, Apr 14, 2020 at 5:12 PM Sean Christopherson > > > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > > > > > On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote: > > > Yes, it's wrong in the abstract, but with respect to faults and the > > > VMX-preemption timer expiration, is there any way for either L1 or L2 > > > to *know* that the virtual CPU has done something wrong? > > > > I don't think so? But how is that relevant, i.e. if we can fix KVM instead > > of fudging the result, why wouldn't we fix KVM? > > I'm not sure that I can fix KVM. The missing #DB traps were relatively > straightforward, but as for the rest of this mess... > > Since you seem to have a handle on what needs to be done, I will defer to > you. I wouldn't go so far as to say I have a handle on it, more like I have an idea of how to fix one part of the overall problem with a generic "rule" change that also happens to (hopefully) resolve the #DB+MTF issue. Anyways, I'll send a patch. Worst case scenario it fails miserably and we go with this patch :-) > > > Isn't it generally true that if you have an exception queued when you > > > transition from L2 to L1, then you've done something wrong? I wonder > > > if the call to kvm_clear_exception_queue() in prepare_vmcs12() just > > > serves to sweep a whole collection of problems under the rug. > > > > More than likely, yes. > > > > > > In general, interception of an event doesn't change the priority of events, > > > > e.g. INTR shouldn't get priority over NMI just because if L1 wants to > > > > intercept INTR but not NMI. > > > > > > Yes, but that's a different problem altogether. > > > > But isn't the fix the same? Stop processing events if a higher priority > > event is pending, regardless of whether the event exits to L1. > > That depends on how you see the scope of the problem. One could argue > that the fix for everything that is wrong with KVM is actually the > same: properly emulate the physical CPU. Heh, there is that. What I'm arguing is that we shouldn't throw in a workaround knowing that it's papering over the underlying issue. Preserving event priority irrespective of VM-Exit behavior is different, in that while it may not resolve all issues that are being masked by kvm_clear_exception_queue(), the change itself is correct when viewed in a vacuum.