Re: [PATCH 00/21] KVM: x86: Event/exception fixes and cleanups

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 28 Mar 2022 17:50:51 +0000

On Sun, Mar 27, 2022, Maxim Levitsky wrote:
> Other than that I am actually very happy that you posted this patch series,
> as this gives more chance that this long standing issue will be fixed,
> and if your patches are better/simpler/less invasive to KVM and still address the issue, 
> I fully support using them instead of mine.

I highly doubt they're simpler or less invasive, but I do hope that the approach
wil be easier to maintain.

> Totally agree with you about your thoughts about splitting pending/injected exception,
> I also can't say I liked my approach that much, for the same reasons you mentioned.
>  
> It is also the main reason I put the whole thing on the backlog lately, 
> because I was feeling that I am changing too much of the KVM, 
> for a relatively theoretical issue.
>  
>  
> I will review your patches, compare them to mine, and check if you or I missed something.
> 
> PS:
> 
> Back then, I also did an extensive review on few cases when qemu injects exceptions itself,
> which it does thankfully rarely. There are several (theoretical) issues there.
> I don't remember those details, I need to refresh my memory.
> 
> AFAIK, qemu injects #MC sometimes when it gets it from the kernel in form of a signal,
> if I recall this correctly, and it also reflects back #DB, when guest debug was enabled
> (and that is the reason for some work I did in this area, like the KVM_GUESTDBG_BLOCKIRQ thing)
> 
> Qemu does this without considering nested and/or pending exception/etc.
> It just kind of abuses the KVM_SET_VCPU_EVENTS for that.

I wouldn't call that abuse, the ioctl() isn't just for migration.  Not checking for
a pending exception is firmly a userspace bug and not something KVM should try to
fix.

For #DB, I suspect it's a non-issue.  The exit is synchronous, so unless userspace
is deferring the reflection, which would be architecturally wrong in and of itself,
there can never be another pending exception. 

For #MC, I think the correct behavior would be to defer the synthesized #MC if there's
a pending exception and resume the guest until the exception is injected.  E.g. if a
different task encounters the real #MC, the synthesized #MC will be fully asynchronous
and may be coincident with a pending exception that is unrelated to the #MC.  That
would require to userspace to enable KVM_CAP_EXCEPTION_PAYLOAD, otherwise userspace
won't be able to differentiate between a pending and injected exception, e.g. if the
#MC occurs during exception vectoring, userspace should override the injected exception
and synthesize #MC, otherwise it would likely soft hang the guest.