On Thu, 2022-03-24 at 21:31 +0000, Sean Christopherson wrote: > On Sun, Mar 13, 2022, Maxim Levitsky wrote: > > On Fri, 2022-03-11 at 03:27 +0000, Sean Christopherson wrote: > > > The main goal of this series is to fix KVM's longstanding bug of not > > > honoring L1's exception intercepts wants when handling an exception that > > > occurs during delivery of a different exception. E.g. if L0 and L1 are > > > using shadow paging, and L2 hits a #PF, and then hits another #PF while > > > vectoring the first #PF due to _L1_ not having a shadow page for the IDT, > > > KVM needs to check L1's intercepts before morphing the #PF => #PF => #DF > > > so that the #PF is routed to L1, not injected into L2 as a #DF. > > > > > > nVMX has hacked around the bug for years by overriding the #PF injector > > > for shadow paging to go straight to VM-Exit, and nSVM has started doing > > > the same. The hacks mostly work, but they're incomplete, confusing, and > > > lead to other hacky code, e.g. bailing from the emulator because #PF > > > injection forced a VM-Exit and suddenly KVM is back in L1. > > > > > > Everything leading up to that are related fixes and cleanups I encountered > > > along the way; some through code inspection, some through tests (I truly > > > thought this series was finished 10 commits and 3 days ago...). > > > > > > Nothing in here is all that urgent; all bugs tagged for stable have been > > > around for multiple releases (years in most cases). > > > > > I am just curious. Are you aware that I worked on this few months ago? > > Ah, so that's why I had a feeling of deja vu when factoring out kvm_queued_exception. > I completely forgot about it :-/ In my defense, that was nearly a year ago[1][2], though > I suppose one could argue 11 == "a few" :-) > > [1] https://lore.kernel.org/all/20210225154135.405125-1-mlevitsk@xxxxxxxxxx > [2] https://lore.kernel.org/all/20210401143817.1030695-3-mlevitsk@xxxxxxxxxx > > > I am sure that you even reviewed some of my code back then. > > Yep, now that I've found the threads I remember discussing the mechanics. > > > If so, could you have had at least mentioned this and/or pinged me to continue > > working on this instead of re-implementing it? > > I'm invoking Hanlon's razor[*]; I certainly didn't intended to stomp over your > work, I simply forgot. Thank you very much for the explanation, and I am glad that it was a honest mistake. Other than that I am actually very happy that you posted this patch series, as this gives more chance that this long standing issue will be fixed, and if your patches are better/simpler/less invasive to KVM and still address the issue, I fully support using them instead of mine. Totally agree with you about your thoughts about splitting pending/injected exception, I also can't say I liked my approach that much, for the same reasons you mentioned. It is also the main reason I put the whole thing on the backlog lately, because I was feeling that I am changing too much of the KVM, for a relatively theoretical issue. I will review your patches, compare them to mine, and check if you or I missed something. PS: Back then, I also did an extensive review on few cases when qemu injects exceptions itself, which it does thankfully rarely. There are several (theoretical) issues there. I don't remember those details, I need to refresh my memory. AFAIK, qemu injects #MC sometimes when it gets it from the kernel in form of a signal, if I recall this correctly, and it also reflects back #DB, when guest debug was enabled (and that is the reason for some work I did in this area, like the KVM_GUESTDBG_BLOCKIRQ thing) Qemu does this without considering nested and/or pending exception/etc. It just kind of abuses the KVM_SET_VCPU_EVENTS for that. Best regards, Maxim Levitsky > > As for the technical aspects, looking back at your series, I strongly considered > taking the same approach of splitting pending vs. injected (again, without any > recollection of your work). I ultimately opted to go with the "immediated morph > to pending VM-Exit" approach as it allows KVM to do the right thing in almost every > case without requiring new ABI, and even if KVM screws up, e.g. queues multiple > pending exceptions. It also neatly handles one-off things like async #PF in L2. > > However, I hadn't considered your approach, which addresses the ABI conundrum by > processing pending=>injected immediately after handling the VM-Exit. I can't think > of any reason that wouldn't work, but I really don't like splitting the event > priority logic, nor do I like having two event injection sites (getting rid of the > extra calls to kvm_check_nested_events() is still on my wish list). If we could go > back in time, I would likely vote for properly tracking injected vs. pending, but > since we're mostly stuck with KVM's ABI, I prefer the "immediately morph to pending > VM-Exit" hack over the "immediately morph to 'injected' exception" hack. > > [*] https://en.wikipedia.org/wiki/Hanlon%27s_razor >