Previously, if L1 launched vmcs12 with both pending debug exceptions and an already-expired VMX-preemption timer, the pending debug exceptions were lost due to a priority inversion between a pending #DB exception and a "VMX-preemption timer expired" VM-exit from L2 to L1. In this scenario, L0 constructs a vmcs02 that has both a zero-valued VMX-preemption timer (assuming enable_preemption_timer is set) and pending debug exceptions. When the vmcs02 is launched/resumed, the hardware correctly prioritizes the pending debug exceptions. At this point, L0 intercepts the resulting #DB trap and queues it up for redelivery. However, when checking for nested events in software, L0 incorrectly prioritizes the "VMX-preemption timer expired" VM-exit from L2 to L1. Technically, nested events should probably be blocked at this point. Hardware has already determined that the #DB trap is the next event that should happen. L0 just got in the way because it was concerned about infinite IDT vectoring loops. Logically, the enqueued #DB trap is quite similar to a "reinjected" event resulting from interrupted IDT-vectoring. Treating it as such fixes the problem, since nested events are blocked when a reinjected event is present. However, there are some ways in which the enqueued interrupted IDT-vectoring. In particular, it should not be recorded in the IDT-vectoring information field of the vmcs12 in the event of a synthesized VM-exit from L2 to L1. I don't believe that path should ever be taken, since the #DB trap should take priority over any synthesized VM-exit from L2 to L1. Recategorize both the reinjected #DB and #AC exceptions as "reinjected" exceptions. For consistency, do the same thing for SVM, even though it doesn't have a VMX-preemption timer equivalent. Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer") Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx> Reviewed-by: Oliver Upton <oupton@xxxxxxxxxx> Reviewed-by: Peter Shier <pshier@xxxxxxxxxx> --- arch/x86/kvm/svm/svm.c | 4 ++-- arch/x86/kvm/vmx/vmx.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 2be5bbae3a40..26b30099c4e4 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1739,7 +1739,7 @@ static int db_interception(struct vcpu_svm *svm) if (!(svm->vcpu.guest_debug & (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) && !svm->nmi_singlestep) { - kvm_queue_exception(&svm->vcpu, DB_VECTOR); + kvm_requeue_exception(&svm->vcpu, DB_VECTOR); return 1; } @@ -1778,7 +1778,7 @@ static int ud_interception(struct vcpu_svm *svm) static int ac_interception(struct vcpu_svm *svm) { - kvm_queue_exception_e(&svm->vcpu, AC_VECTOR, 0); + kvm_requeue_exception_e(&svm->vcpu, AC_VECTOR, 0); return 1; } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 83050977490c..aae01253bfba 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4682,7 +4682,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) if (is_icebp(intr_info)) WARN_ON(!skip_emulated_instruction(vcpu)); - kvm_queue_exception(vcpu, DB_VECTOR); + kvm_requeue_exception(vcpu, DB_VECTOR); return 1; } kvm_run->debug.arch.dr6 = dr6 | DR6_FIXED_1; @@ -4703,7 +4703,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) break; case AC_VECTOR: if (guest_inject_ac(vcpu)) { - kvm_queue_exception_e(vcpu, AC_VECTOR, error_code); + kvm_requeue_exception_e(vcpu, AC_VECTOR, error_code); return 1; } -- 2.26.0.110.g2183baf09c-goog