Re: [PATCH v2 2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt

Liran Alon <LIRAN.ALON@xxxxxxxxxx> · Thu, 07 Dec 2017 04:29:16 +0200

On 06/12/17 20:52, Radim Krčmář wrote:
2017-12-05 10:16+0200, Liran Alon:
In case posted-interrupt was delivered to CPU while it is in host
(outside guest), then posted-interrupt delivery will be done by
calling sync_pir_to_irr() at vmentry after interrupts are disabled.

sync_pir_to_irr() will check if vmx->pi_desc.control ON bit and if
set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to
ensure virtual-interrupt-delivery will dispatch interrupt to guest.

However, it is possible that L1 will receive a posted-interrupt while
CPU runs at host and is about to enter L2. In this case, the call to
sync_pir_to_irr() will indeed update the L1's APIC IRR but
vcpu_enter_guest() will then just resume into L2 guest without
re-evaluating if it should exit from L2 to L1 as a result of this
new pending L1 event.

To address this case, if sync_pir_to_irr() has a new L1 injectable
interrupt and CPU is running L2, we set KVM_REQ_EVENT.
This will cause vcpu_enter_guest() to run another iteration of
evaluating pending KVM requests and will therefore consume
KVM_REQ_EVENT which will make sure to call check_nested_events() which
will handle the pending L1 event properly.

Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
---
  arch/x86/kvm/vmx.c | 15 ++++++++++++++-
  1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f5074ec5701b..47bbb8b691e8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9031,20 +9031,33 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
  static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
  {
  	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int prev_max_irr;
  	int max_irr;

  	WARN_ON(!vcpu->arch.apicv_active);
+
+	prev_max_irr = kvm_lapic_find_highest_irr(vcpu);
  	if (pi_test_on(&vmx->pi_desc)) {
  		pi_clear_on(&vmx->pi_desc);
+
  		/*
  		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
  		 * But on x86 this is just a compiler barrier anyway.
  		 */
  		smp_mb__after_atomic();
  		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);

I think the optimization (partly livelock protection) is not worth the
overhead of two IRR scans for non-nested guests.  Please make
kvm_apic_update_irr() return both prev_max_irr and max_irr in one pass.

OK. I will modify kvm_apic_update_irr().


+
+		/*
+		 * If we are running L2 and L1 has a new pending interrupt
+		 * which can be injected, we should re-evaluate
+		 * what should be done with this new L1 interrupt.
+		 */
+		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
+			kvm_make_request(KVM_REQ_EVENT, vcpu);

We don't need anything from KVM_REQ_EVENT and only use it to abort the
VM entry, kvm_vcpu_exiting_guest_mode() is better for that.

Yes you are right. I will change to kvm_vcpu_exiting_guest_mode().


  	} else {
-		max_irr = kvm_lapic_find_highest_irr(vcpu);
+		max_irr = prev_max_irr;
  	}
+
  	vmx_hwapic_irr_update(vcpu, max_irr);

We also should just inject the interrupt if L2 is run without
nested_exit_on_intr(), maybe reusing the check in vmx_hwapic_irr_update?
See next patch in series :)


Thanks.


Regards,
-Liran