On Sun, Jun 27, 2021 at 4:38 PM Stas Sergeev <stsp2@xxxxxxxxx> wrote: > > When returning to user, the special care is taken about the > exception that was already injected to VMCS but not yet to guest. > cancel_injection removes such exception from VMCS. It is set as > pending, and if the user does KVM_SET_REGS, it gets completely canceled. > > This didn't happen though, because the vcpu->arch.exception.injected > and vcpu->arch.exception.pending were forgotten to update in > cancel_injection. As the result, KVM_SET_REGS didn't cancel out > anything, and the exception was re-injected on the next KVM_RUN, > even though the guest registers (like EIP) were already modified. > This was leading to an exception coming from the "wrong place". > > This patch makes sure the vcpu->arch.exception.injected and > vcpu->arch.exception.pending are in sync with the reality (and > with VMCS). Also it adds WARN_ON_ONCE() to __set_regs() to make > sure vcpu->arch.exception.injected is never set here, because > if it is, the exception context is going to be corrupted the same > way it was before that patch. > Adding WARN_ON_ONCE() alone, without the fix, was verified to > actually trigger and detect a buggy scenario. > > Signed-off-by: Stas Sergeev <stsp2@xxxxxxxxx> > > CC: Paolo Bonzini <pbonzini@xxxxxxxxxx> > CC: Sean Christopherson <seanjc@xxxxxxxxxx> > CC: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > CC: Wanpeng Li <wanpengli@xxxxxxxxxxx> > CC: Jim Mattson <jmattson@xxxxxxxxxx> > CC: Joerg Roedel <joro@xxxxxxxxxx> > CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > CC: Ingo Molnar <mingo@xxxxxxxxxx> > CC: Borislav Petkov <bp@xxxxxxxxx> > CC: x86@xxxxxxxxxx > CC: "H. Peter Anvin" <hpa@xxxxxxxxx> > CC: kvm@xxxxxxxxxxxxxxx > --- > arch/x86/kvm/x86.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index e0f4a46649d7..bc6ca8641824 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9450,7 +9450,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > cancel_injection: > if (req_immediate_exit) > kvm_make_request(KVM_REQ_EVENT, vcpu); > - static_call(kvm_x86_cancel_injection)(vcpu); > + if (vcpu->arch.exception.injected) { > + static_call(kvm_x86_cancel_injection)(vcpu); > + vcpu->arch.exception.injected = false; > + vcpu->arch.exception.pending = true; > + } > if (unlikely(vcpu->arch.apic_attention)) > kvm_lapic_sync_from_vapic(vcpu); > out: > @@ -9822,6 +9826,7 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) > kvm_rip_write(vcpu, regs->rip); > kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED); > > + WARN_ON_ONCE(vcpu->arch.exception.injected); > vcpu->arch.exception.pending = false; > > kvm_make_request(KVM_REQ_EVENT, vcpu); > -- > 2.32.0 > This doesn't work. Kvm has no facilities for converting an injected exception back into a pending exception. In particular, if the exception has side effects, such as a #PF which sets CR2, those side effects have already taken place. Once kvm sets the VM-entry interruption-information field, the next VM-entry must deliver that exception. You could arrange to back it out, but you would also have to back out the changes to CR2 (for #PF) or DR6 (for #DB). Cancel_injection *should* leave the exception in the 'injected' state, and KVM_SET_REGS *should not* clear an injected exception. (I don't think it's right to clear a pending exception either, if that exception happens to be a trap, but that's a different discussion). It seems to me that the crux of the problem here is that run->ready_for_interrupt_injection returns true when it should return false. That's probably where you should focus your efforts.