On Mon, 2022-03-28 at 17:50 +0000, Sean Christopherson wrote: > On Sun, Mar 27, 2022, Maxim Levitsky wrote: > > Other than that I am actually very happy that you posted this patch series, > > as this gives more chance that this long standing issue will be fixed, > > and if your patches are better/simpler/less invasive to KVM and still address the issue, > > I fully support using them instead of mine. > > I highly doubt they're simpler or less invasive, but I do hope that the approach > wil be easier to maintain. > > > Totally agree with you about your thoughts about splitting pending/injected exception, > > I also can't say I liked my approach that much, for the same reasons you mentioned. > > > > It is also the main reason I put the whole thing on the backlog lately, > > because I was feeling that I am changing too much of the KVM, > > for a relatively theoretical issue. > > > > > > I will review your patches, compare them to mine, and check if you or I missed something. > > > > PS: > > > > Back then, I also did an extensive review on few cases when qemu injects exceptions itself, > > which it does thankfully rarely. There are several (theoretical) issues there. > > I don't remember those details, I need to refresh my memory. > > > > AFAIK, qemu injects #MC sometimes when it gets it from the kernel in form of a signal, > > if I recall this correctly, and it also reflects back #DB, when guest debug was enabled > > (and that is the reason for some work I did in this area, like the KVM_GUESTDBG_BLOCKIRQ thing) > > > > Qemu does this without considering nested and/or pending exception/etc. > > It just kind of abuses the KVM_SET_VCPU_EVENTS for that. > > I wouldn't call that abuse, the ioctl() isn't just for migration. Not checking for > a pending exception is firmly a userspace bug and not something KVM should try to > fix. yes, but to make the right decision, the userspace has to know if there is a pending exception, and if there is, then merge it (which might even involve triple fault), On top of that it is possible that pending excpetion is not intercepted by L1, but merged result is, so injecting the exception will cause nested VMexit, which is something that is hard for userspace to model. I think that the cleanest way to do this is to add new ioctl, KVM_INJECT_EXCEPTION, which can do the right thing in the kernel, but I am not sure that it is worth it, knowing that thankfully userspace doesn't inject exceptions much. > > For #DB, I suspect it's a non-issue. The exit is synchronous, so unless userspace > is deferring the reflection, which would be architecturally wrong in and of itself, > there can never be another pending exception. Could very be, but still there could be corner cases. Like what if you set data fetch breakpoint on a IDT entry of some exception? I guess during delivery of that exception there might be #DB, but I am not 100% expert on when and how #DB is generated, so I can't be sure. Anyway #DB isn't a big deal because qemu only re-injects it when guest debug is enabled and that is broken anyway, and does worse things like leaking EFLAGS.TF to the guest stack and such. > > For #MC, I think the correct behavior would be to defer the synthesized #MC if there's > a pending exception and resume the guest until the exception is injected. E.g. if a > different task encounters the real #MC, the synthesized #MC will be fully asynchronous > and may be coincident with a pending exception that is unrelated to the #MC. That > would require to userspace to enable KVM_CAP_EXCEPTION_PAYLOAD, otherwise userspace > won't be able to differentiate between a pending and injected exception, e.g. if the > #MC occurs during exception vectoring, userspace should override the injected exception > and synthesize #MC, otherwise it would likely soft hang the guest. Something like that, I don't remember all the details. Best regards, Maxim Levitsky >