Re: [PATCH 00/21] KVM: x86: Event/exception fixes and cleanups

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 24 Mar 2022 21:31:14 +0000

On Sun, Mar 13, 2022, Maxim Levitsky wrote:
> On Fri, 2022-03-11 at 03:27 +0000, Sean Christopherson wrote:
> > The main goal of this series is to fix KVM's longstanding bug of not
> > honoring L1's exception intercepts wants when handling an exception that
> > occurs during delivery of a different exception.  E.g. if L0 and L1 are
> > using shadow paging, and L2 hits a #PF, and then hits another #PF while
> > vectoring the first #PF due to _L1_ not having a shadow page for the IDT,
> > KVM needs to check L1's intercepts before morphing the #PF => #PF => #DF
> > so that the #PF is routed to L1, not injected into L2 as a #DF.
> > 
> > nVMX has hacked around the bug for years by overriding the #PF injector
> > for shadow paging to go straight to VM-Exit, and nSVM has started doing
> > the same.  The hacks mostly work, but they're incomplete, confusing, and
> > lead to other hacky code, e.g. bailing from the emulator because #PF
> > injection forced a VM-Exit and suddenly KVM is back in L1.
> > 
> > Everything leading up to that are related fixes and cleanups I encountered
> > along the way; some through code inspection, some through tests (I truly
> > thought this series was finished 10 commits and 3 days ago...).
> > 
> > Nothing in here is all that urgent; all bugs tagged for stable have been
> > around for multiple releases (years in most cases).
> > 
> I am just curious. Are you aware that I worked on this few months ago?

Ah, so that's why I had a feeling of deja vu when factoring out kvm_queued_exception.
I completely forgot about it :-/  In my defense, that was nearly a year ago[1][2], though
I suppose one could argue 11 == "a few" :-)

[1] https://lore.kernel.org/all/20210225154135.405125-1-mlevitsk@xxxxxxxxxx
[2] https://lore.kernel.org/all/20210401143817.1030695-3-mlevitsk@xxxxxxxxxx

> I am sure that you even reviewed some of my code back then.

Yep, now that I've found the threads I remember discussing the mechanics.

> If so, could you have had at least mentioned this and/or pinged me to continue
> working on this instead of re-implementing it?

I'm invoking Hanlon's razor[*]; I certainly didn't intended to stomp over your
work, I simply forgot.

As for the technical aspects, looking back at your series, I strongly considered
taking the same approach of splitting pending vs. injected (again, without any
recollection of your work).  I ultimately opted to go with the "immediated morph
to pending VM-Exit" approach as it allows KVM to do the right thing in almost every
case without requiring new ABI, and even if KVM screws up, e.g. queues multiple
pending exceptions.  It also neatly handles one-off things like async #PF in L2.

However, I hadn't considered your approach, which addresses the ABI conundrum by
processing pending=>injected immediately after handling the VM-Exit.  I can't think
of any reason that wouldn't work, but I really don't like splitting the event
priority logic, nor do I like having two event injection sites (getting rid of the
extra calls to kvm_check_nested_events() is still on my wish list).  If we could go
back in time, I would likely vote for properly tracking injected vs. pending, but
since we're mostly stuck with KVM's ABI, I prefer the "immediately morph to pending
VM-Exit" hack over the "immediately morph to 'injected' exception" hack.

[*] https://en.wikipedia.org/wiki/Hanlon%27s_razor