On Tue, 2022-06-14 at 20:47 +0000, Sean Christopherson wrote: > Drop pending exceptions and events queued for re-injection when leaving > nested guest mode, even if the "exit" is due to VM-Fail, SMI, or forced > by host userspace. Failure to purge events could result in an event > belonging to L2 being injected into L1. > > This _should_ never happen for VM-Fail as all events should be blocked by > nested_run_pending, but it's possible if KVM, not the L1 hypervisor, is > the source of VM-Fail when running vmcs02. > > SMI is a nop (barring unknown bugs) as recognition of SMI and thus entry > to SMM is blocked by pending exceptions and re-injected events. > > Forced exit is definitely buggy, but has likely gone unnoticed because > userspace probably follows the forced exit with KVM_SET_VCPU_EVENTS (or > some other ioctl() that purges the queue). > > Fixes: 4f350c6dbcb9 ("kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/vmx/nested.c | 19 +++++++++++-------- > 1 file changed, 11 insertions(+), 8 deletions(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 7d8cd0ebcc75..ee6f27dffdba 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -4263,14 +4263,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > nested_vmx_abort(vcpu, > VMX_ABORT_SAVE_GUEST_MSR_FAIL); > } > - > - /* > - * Drop what we picked up for L2 via vmx_complete_interrupts. It is > - * preserved above and would only end up incorrectly in L1. > - */ > - vcpu->arch.nmi_injected = false; > - kvm_clear_exception_queue(vcpu); > - kvm_clear_interrupt_queue(vcpu); > } > > /* > @@ -4609,6 +4601,17 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason, > WARN_ON_ONCE(nested_early_check); > } > > + /* > + * Drop events/exceptions that were queued for re-injection to L2 > + * (picked up via vmx_complete_interrupts()), as well as exceptions > + * that were pending for L2. Note, this must NOT be hoisted above > + * prepare_vmcs12(), events/exceptions queued for re-injection need to > + * be captured in vmcs12 (see vmcs12_save_pending_event()). > + */ > + vcpu->arch.nmi_injected = false; > + kvm_clear_exception_queue(vcpu); > + kvm_clear_interrupt_queue(vcpu); > + > vmx_switch_vmcs(vcpu, &vmx->vmcs01); > > /* Update any VMCS fields that might have changed while L2 ran */ Makes sense. Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Best regards, Maxim Levitsky