On Sun, Mar 27, 2022, Maxim Levitsky wrote: > Other than that I am actually very happy that you posted this patch series, > as this gives more chance that this long standing issue will be fixed, > and if your patches are better/simpler/less invasive to KVM and still address the issue, > I fully support using them instead of mine. I highly doubt they're simpler or less invasive, but I do hope that the approach wil be easier to maintain. > Totally agree with you about your thoughts about splitting pending/injected exception, > I also can't say I liked my approach that much, for the same reasons you mentioned. > > It is also the main reason I put the whole thing on the backlog lately, > because I was feeling that I am changing too much of the KVM, > for a relatively theoretical issue. > > > I will review your patches, compare them to mine, and check if you or I missed something. > > PS: > > Back then, I also did an extensive review on few cases when qemu injects exceptions itself, > which it does thankfully rarely. There are several (theoretical) issues there. > I don't remember those details, I need to refresh my memory. > > AFAIK, qemu injects #MC sometimes when it gets it from the kernel in form of a signal, > if I recall this correctly, and it also reflects back #DB, when guest debug was enabled > (and that is the reason for some work I did in this area, like the KVM_GUESTDBG_BLOCKIRQ thing) > > Qemu does this without considering nested and/or pending exception/etc. > It just kind of abuses the KVM_SET_VCPU_EVENTS for that. I wouldn't call that abuse, the ioctl() isn't just for migration. Not checking for a pending exception is firmly a userspace bug and not something KVM should try to fix. For #DB, I suspect it's a non-issue. The exit is synchronous, so unless userspace is deferring the reflection, which would be architecturally wrong in and of itself, there can never be another pending exception. For #MC, I think the correct behavior would be to defer the synthesized #MC if there's a pending exception and resume the guest until the exception is injected. E.g. if a different task encounters the real #MC, the synthesized #MC will be fully asynchronous and may be coincident with a pending exception that is unrelated to the #MC. That would require to userspace to enable KVM_CAP_EXCEPTION_PAYLOAD, otherwise userspace won't be able to differentiate between a pending and injected exception, e.g. if the #MC occurs during exception vectoring, userspace should override the injected exception and synthesize #MC, otherwise it would likely soft hang the guest.