On Thu, Mar 31, 2022, Maciej S. Szmigiero wrote: > On 30.03.2022 23:59, Sean Christopherson wrote: > > On Thu, Mar 10, 2022, Maciej S. Szmigiero wrote: > > > @@ -3627,6 +3632,14 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) > > > if (!(exitintinfo & SVM_EXITINTINFO_VALID)) > > > return; > > > + /* L1 -> L2 event re-injection needs a different handling */ > > > + if (is_guest_mode(vcpu) && > > > + exit_during_event_injection(svm, svm->nested.ctl.event_inj, > > > + svm->nested.ctl.event_inj_err)) { > > > + nested_svm_maybe_reinject(vcpu); > > > > Why is this manually re-injecting? More specifically, why does the below (out of > > sight in the diff) code that re-queues the exception/interrupt not work? The > > re-queued event should be picked up by nested_save_pending_event_to_vmcb12() and > > propagatred to vmcb12. > > A L1 -> L2 injected event should either be re-injected until successfully > injected into L2 or propagated to VMCB12 if there is a nested VMEXIT > during its delivery. > > svm_complete_interrupts() does not do such re-injection in some cases > (soft interrupts, soft exceptions, #VC) - it is trying to resort to > emulation instead, which is incorrect in this case. > > I think it's better to split out this L1 -> L2 nested case to a > separate function in nested.c rather than to fill > svm_complete_interrupts() in already very large svm.c with "if" blocks > here and there. Ah, I see it now. WTF. Ugh, commit 66fd3f7f901f ("KVM: Do not re-execute INTn instruction.") fixed VMX, but left SVM broken. Re-executing the INTn is wrong, the instruction has already completed decode and execution. E.g. if there's there's a code breakpoint on the INTn, rewinding will cause a spurious #DB. KVM's INT3 shenanigans are bonkers, but I guess there's no better option given that the APM says "Software interrupts cannot be properly injected if the processor does not support the NextRIP field.". What a mess. Anyways, for the common nrips=true case, I strongly prefer that we properly fix the non-nested case and re-inject software interrupts, which should in turn naturally fix this nested case. And for nrips=false, my vote is to either punt and document it as a "KVM erratum", or straight up make nested require nrips. Note, that also requires updating svm_queue_exception(), which assumes it will only be handed hardware exceptions, i.e. hardcodes type EXEPT. That's blatantly wrong, e.g. if userspace injects a software exception via KVM_SET_VCPU_EVENTS.