Re: [PATCH] KVM: nVMX: Fix bug of injecting L2 exception into L1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 20/11/17 23:47, Paolo Bonzini wrote:
On 19/11/2017 17:25, Liran Alon wrote:
L2 RDMSR could be handled as described below:
1) L2 exits to L0 on RDMSR and calls handle_rdmsr().
2) handle_rdmsr() calls kvm_inject_gp() which sets
KVM_REQ_EVENT, exception.pending=true and exception.injected=false.
3) vcpu_enter_guest() consumes KVM_REQ_EVENT and calls
inject_pending_event() which calls vmx_check_nested_events()
which sees that exception.pending=true but
nested_vmx_check_exception() returns 0 and therefore does nothing at
this point. However let's assume it later sees vmx-preemption-timer
expired and therefore exits from L2 to L1 by calling
nested_vmx_vmexit().> 4) nested_vmx_vmexit() calls prepare_vmcs12()
which calls vmcs12_save_pending_event() but it does nothing as
exception.injected is false. Also prepare_vmcs12() calls
kvm_clear_exception_queue() which does nothing as
exception.injected is already false.
5) We now return from vmx_check_nested_events() with 0 while still
having exception.pending=true!
6) Therefore inject_pending_event() continues
and we inject L2 exception to L1!...

But this is buggy as well, because the #GP is lost, isn't it?

I don't think so.

Since commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been injected"), there is a fundamental difference between a pending exception and an injected exception. A pending exception means that no side-effects of the exception have been applied yet. Including incrementing the RIP after the instruction which cause exception. In our case for example, handle_wrmsr() calls kvm_inject_gp() and returns without calling kvm_skip_emulated_instruction() which increments the RIP.

Therefore, when we exit from L2 to L1 on vmx-preemption-timer, we can safely clear exception.pending because when L1 will resume L2, the exception will be raised again (the same WRMSR instruction will be run again which will raise #GP again). This is also why vmcs12_save_pending_event() only makes sure to save in VMCS12 idt-vectoring-info the "injected" events and not the "pending" events (interrupt.pending is misleading name and I would rename it in upcoming patch to interrupt.injected. See explanation below). And this is also why exception.pending is intercepted but exception.injected is not.

I can confirm this patch works because I have wrote a kvm-unit-test which reproduce this issue. And after the fix the #GP is not lost and raised to L2 directly correctly. (I haven't posted the unit-test yet because it is very dependent on correct vmx-preemption-timer timer config that varies between environments).


Is the bug that the preemption timer vmexit should only be injected if
there are no pending exceptions?  In fact, the same bug could also
happened for NMIs or interrupts, I think.
As explained above, I don't think so.

In general there is a bit of mess in KVM code regarding clean separation between a "pending" event and an "injected" event. NMIs & Exceptions are now separated nicely. However, interrupt.pending is actually interrupt.injected as it is signaled after the side-effects have occurred (bit moved from IRR to ISR for example).

I am going to post another series (which is a v2 of a previous series I posted here) tomorrow that will attempt to clean this and on the way fix a couple of bugs in this area.


So, something like (101% untested):

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5b436f4e6e3e..64eecb8b126d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11137,8 +11137,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
  	bool block_nested_events =
  	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);

-	if (vcpu->arch.exception.pending &&
-		nested_vmx_check_exception(vcpu, &exit_qual)) {
+	if (vcpu->arch.exception.pending) {
+		if (!nested_vmx_check_exception(vcpu, &exit_qual))
+			return 0;
  		if (block_nested_events)
  			return -EBUSY;
  		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
@@ -11146,15 +11147,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
  		return 0;
  	}

-	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
-	    vmx->nested.preemption_timer_expired) {
-		if (block_nested_events)
-			return -EBUSY;
-		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
-		return 0;
-	}
-
-	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
+	if (vcpu->arch.nmi_pending) {
+		if (!nested_exit_on_nmi(vcpu))
+			return 0;
  		if (block_nested_events)
  			return -EBUSY;
  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
@@ -11169,14 +11164,23 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
  		return 0;
  	}

-	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
-	    nested_exit_on_intr(vcpu)) {
+	if (kvm_cpu_has_interrupt(vcpu) || external_intr) {
+		if (!nested_exit_on_intr(vcpu))
+			return 0;
  		if (block_nested_events)
  			return -EBUSY;
  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
  		return 0;
  	}

+	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
+	    vmx->nested.preemption_timer_expired) {
+		if (block_nested_events)
+			return -EBUSY;
+		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
+		return 0;
+	}
+
  	vmx_complete_nested_posted_interrupt(vcpu);
  	return 0;
  }

Paolo

This commit will fix above issue by changing step (4) to
clear exception.pending in kvm_clear_exception_queue().

Fixes: 664f8e26b00c ("KVM: X86: Fix loss of exception which has not
yet been injected")

Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx>
---
  arch/x86/kvm/vmx.c | 1 -
  arch/x86/kvm/x86.h | 1 +
  2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7c3522a989d0..bee08782c781 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11035,7 +11035,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
  		if (vmx->nested.nested_run_pending)
  			return -EBUSY;
  		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
-		vcpu->arch.exception.pending = false;
  		return 0;
  	}

diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d0b95b7a90b4..6d112d8f799c 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -12,6 +12,7 @@

  static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
  {
+	vcpu->arch.exception.pending = false;
  	vcpu->arch.exception.injected = false;
  }



Should kvm_clear_interrupt_queue do the same?
interrupts currently only have interrupt.pending (which as I said above, it is actually interrupt.injected). Therefore kvm_clear_interrupt_queue() clears all there is...

Thanks,

Paolo


Regards,
-Liran



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux