[PATCH] nVMX: Fix warning-causing idt-vectoring-info behavior

"Nadav Har'El" <nyh@xxxxxxxxxx> · Wed, 21 Sep 2011 13:48:13 +0300

This patch solves two outstanding nested-VMX issues:

1. "unexpected, valid vectoring info" warnings appeared in L1.
These are fixed by correcting the emulation of concurrent L0->L1 and
L1->L2 injections.

2. When we must run L2 next (e.g., on L1's VMLAUNCH/VMRESUME), injection into
L1 was delayed for an unknown amount of time - until L2 exits.
We now force (using a self IPI) an exit immediately after entry to L2,
so that the injection into L1 happens promptly.

Some more details about these issues:

When L0 wishes to inject an interrupt while L2 is running, it emulates an exit
to L1 with EXIT_REASON_EXTERNAL_INTERRUPT. This was explained in the original
nVMX patch 23, titled "Correct handling of interrupt injection".

Unfortunately, it is possible (though rare) that at this point there is valid
idt_vectoring_info in vmcs02. For example, L1 injected some interrupt to L2,
and when L2 tried to run this interrupt's handler, it got a page fault - so
it returns the original interrupt vector in idt_vectoring_info. The problem
is that if this is the case, we cannot exit to L1 with EXTERNAL_INTERRUPT
like we wished to, because the VMX spec guarantees that idt_vectoring_info
and exit_reason_external_interrupt can never happen together. This is not
just specified in the spec - a KVM L1 actually prints a kernel warning
"unexpected, valid vectoring info" if we violate this guarantee, and some
users noticed these warnings in L1's logs.

In order to better emulate a processor, which would never return the external
interrupt and the idt-vectoring-info together, we need to separate the two
injection steps: First, complete L1's injection into L2 (i.e., enter L2,
injecting to it the idt-vectoring-info); Second, after entry into L2 succeeds
and it exits back to L0, exit to L1 with the EXIT_REASON_EXTERNAL_INTERRUPT.
Most of this is already in the code - the only change we need is to remain
in L2 (and not exit to L1) in this case.

However, to ensure prompt injection to L1, instead of letting L2 run for
a while after entering, we can send a self-IPI, which will ensure that
L2 exits immediately after the (necessary) entry, so we can inject into L1
as soon as possible.

Note that we added this self-IPI not only in the idt-vectoring-info case
above, but in every case where we are forced to enter L2 despite wishing to
inject to L1. This includes the case when L1 just VMLAUNCH/VMRESUMEed L2.

Note how we test vmcs12->idt_vectoring_info_field; This isn't really the
vmcs12 value (we haven't exited to L1 yet, so vmcs12 hasn't been updated),
but rather the place we save, at the end of vmx_vcpu_run, the vmcs02 value
of this field. This was explained in patch 25 ("Correct handling of idt
vectoring info") of the original nVMX patch series.

Thanks to Dave Allan and to Federico Simoncelli for reporting this bug,
to Abel Gordon for helping me figure out the solution, and to Avi Kivity
for helping to improve it.

Signed-off-by: Nadav Har'El <nyh@xxxxxxxxxx>
---
 arch/x86/kvm/vmx.c |   20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

--- .before/arch/x86/kvm/vmx.c	2011-09-21 13:45:59.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2011-09-21 13:45:59.000000000 +0300
@@ -3858,12 +3858,17 @@ static bool nested_exit_on_intr(struct k
 static void enable_irq_window(struct kvm_vcpu *vcpu)
 {
 	u32 cpu_based_vm_exec_control;
-	if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
-		/* We can get here when nested_run_pending caused
-		 * vmx_interrupt_allowed() to return false. In this case, do
-		 * nothing - the interrupt will be injected later.
+	if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) {
+		/*
+		 * We get here if vmx_interrupt_allowed() returned 0 because
+		 * we must enter L2 now, so we can't inject to L1 now. If we
+		 * just do nothing, L2 will later exit and we can inject the
+		 * IRQ to L1 then. But to make L2 exit more promptly, we send
+		 * a self-IPI, causing L2 to exit right after entry.
 		 */
+		smp_send_reschedule(vcpu->cpu);
 		return;
+	}
 
 	cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
 	cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
@@ -3990,11 +3995,12 @@ static void vmx_set_nmi_mask(struct kvm_
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
 	if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) {
-		struct vmcs12 *vmcs12;
-		if (to_vmx(vcpu)->nested.nested_run_pending)
+		struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+		if (to_vmx(vcpu)->nested.nested_run_pending ||
+		    (vmcs12->idt_vectoring_info_field &
+		     VECTORING_INFO_VALID_MASK))
 			return 0;
 		nested_vmx_vmexit(vcpu);
-		vmcs12 = get_vmcs12(vcpu);
 		vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
 		vmcs12->vm_exit_intr_info = 0;
 		/* fall through to normal code, but now in L1, not L2 */
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html