Re: [PATCH 00/21] KVM: x86: Event/exception fixes and cleanups

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Sun, 27 Mar 2022 18:06:07 +0300

On Thu, 2022-03-24 at 21:31 +0000, Sean Christopherson wrote:
> On Sun, Mar 13, 2022, Maxim Levitsky wrote:
> > On Fri, 2022-03-11 at 03:27 +0000, Sean Christopherson wrote:
> > > The main goal of this series is to fix KVM's longstanding bug of not
> > > honoring L1's exception intercepts wants when handling an exception that
> > > occurs during delivery of a different exception.  E.g. if L0 and L1 are
> > > using shadow paging, and L2 hits a #PF, and then hits another #PF while
> > > vectoring the first #PF due to _L1_ not having a shadow page for the IDT,
> > > KVM needs to check L1's intercepts before morphing the #PF => #PF => #DF
> > > so that the #PF is routed to L1, not injected into L2 as a #DF.
> > > 
> > > nVMX has hacked around the bug for years by overriding the #PF injector
> > > for shadow paging to go straight to VM-Exit, and nSVM has started doing
> > > the same.  The hacks mostly work, but they're incomplete, confusing, and
> > > lead to other hacky code, e.g. bailing from the emulator because #PF
> > > injection forced a VM-Exit and suddenly KVM is back in L1.
> > > 
> > > Everything leading up to that are related fixes and cleanups I encountered
> > > along the way; some through code inspection, some through tests (I truly
> > > thought this series was finished 10 commits and 3 days ago...).
> > > 
> > > Nothing in here is all that urgent; all bugs tagged for stable have been
> > > around for multiple releases (years in most cases).
> > > 
> > I am just curious. Are you aware that I worked on this few months ago?
> 
> Ah, so that's why I had a feeling of deja vu when factoring out kvm_queued_exception.
> I completely forgot about it :-/  In my defense, that was nearly a year ago[1][2], though
> I suppose one could argue 11 == "a few" :-)
> 
> [1] https://lore.kernel.org/all/20210225154135.405125-1-mlevitsk@xxxxxxxxxx
> [2] https://lore.kernel.org/all/20210401143817.1030695-3-mlevitsk@xxxxxxxxxx
> 
> > I am sure that you even reviewed some of my code back then.
> 
> Yep, now that I've found the threads I remember discussing the mechanics.
> 
> > If so, could you have had at least mentioned this and/or pinged me to continue
> > working on this instead of re-implementing it?
> 
> I'm invoking Hanlon's razor[*]; I certainly didn't intended to stomp over your
> work, I simply forgot.

Thank you very much for the explanation, and I am glad that it was a honest mistake.

Other than that I am actually very happy that you posted this patch series,
as this gives more chance that this long standing issue will be fixed,
and if your patches are better/simpler/less invasive to KVM and still address the issue, 
I fully support using them instead of mine.

Totally agree with you about your thoughts about splitting pending/injected exception,
I also can't say I liked my approach that much, for the same reasons you mentioned.

It is also the main reason I put the whole thing on the backlog lately, 
because I was feeling that I am changing too much of the KVM, 
for a relatively theoretical issue.

I will review your patches, compare them to mine, and check if you or I missed something.

PS:

Back then, I also did an extensive review on few cases when qemu injects exceptions itself,
which it does thankfully rarely. There are several (theoretical) issues there.
I don't remember those details, I need to refresh my memory.

AFAIK, qemu injects #MC sometimes when it gets it from the kernel in form of a signal,
if I recall this correctly, and it also reflects back #DB, when guest debug was enabled
(and that is the reason for some work I did in this area, like the KVM_GUESTDBG_BLOCKIRQ thing)

Qemu does this without considering nested and/or pending exception/etc.
It just kind of abuses the KVM_SET_VCPU_EVENTS for that.

Best regards,
	Maxim Levitsky

> 
> As for the technical aspects, looking back at your series, I strongly considered
> taking the same approach of splitting pending vs. injected (again, without any
> recollection of your work).  I ultimately opted to go with the "immediated morph
> to pending VM-Exit" approach as it allows KVM to do the right thing in almost every
> case without requiring new ABI, and even if KVM screws up, e.g. queues multiple
> pending exceptions.  It also neatly handles one-off things like async #PF in L2.
> 
> However, I hadn't considered your approach, which addresses the ABI conundrum by
> processing pending=>injected immediately after handling the VM-Exit.  I can't think
> of any reason that wouldn't work, but I really don't like splitting the event
> priority logic, nor do I like having two event injection sites (getting rid of the
> extra calls to kvm_check_nested_events() is still on my wish list).  If we could go
> back in time, I would likely vote for properly tracking injected vs. pending, but
> since we're mostly stuck with KVM's ABI, I prefer the "immediately morph to pending
> VM-Exit" hack over the "immediately morph to 'injected' exception" hack.
> 
> [*] https://en.wikipedia.org/wiki/Hanlon%27s_razor
>