Re: [PATCH v2 2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Wed, 6 Dec 2017 19:52:31 +0100

2017-12-05 10:16+0200, Liran Alon:
> In case posted-interrupt was delivered to CPU while it is in host
> (outside guest), then posted-interrupt delivery will be done by
> calling sync_pir_to_irr() at vmentry after interrupts are disabled.
> 
> sync_pir_to_irr() will check if vmx->pi_desc.control ON bit and if
> set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to
> ensure virtual-interrupt-delivery will dispatch interrupt to guest.
> 
> However, it is possible that L1 will receive a posted-interrupt while
> CPU runs at host and is about to enter L2. In this case, the call to
> sync_pir_to_irr() will indeed update the L1's APIC IRR but
> vcpu_enter_guest() will then just resume into L2 guest without
> re-evaluating if it should exit from L2 to L1 as a result of this
> new pending L1 event.
> 
> To address this case, if sync_pir_to_irr() has a new L1 injectable
> interrupt and CPU is running L2, we set KVM_REQ_EVENT.
> This will cause vcpu_enter_guest() to run another iteration of
> evaluating pending KVM requests and will therefore consume
> KVM_REQ_EVENT which will make sure to call check_nested_events() which
> will handle the pending L1 event properly.
> 
> Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx>
> Reviewed-by: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> ---
>  arch/x86/kvm/vmx.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f5074ec5701b..47bbb8b691e8 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9031,20 +9031,33 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>  static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int prev_max_irr;
>  	int max_irr;
>  
>  	WARN_ON(!vcpu->arch.apicv_active);
> +
> +	prev_max_irr = kvm_lapic_find_highest_irr(vcpu);
>  	if (pi_test_on(&vmx->pi_desc)) {
>  		pi_clear_on(&vmx->pi_desc);
> +
>  		/*
>  		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
>  		 * But on x86 this is just a compiler barrier anyway.
>  		 */
>  		smp_mb__after_atomic();
>  		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);

I think the optimization (partly livelock protection) is not worth the
overhead of two IRR scans for non-nested guests.  Please make
kvm_apic_update_irr() return both prev_max_irr and max_irr in one pass.

> +
> +		/*
> +		 * If we are running L2 and L1 has a new pending interrupt
> +		 * which can be injected, we should re-evaluate
> +		 * what should be done with this new L1 interrupt.
> +		 */
> +		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
> +			kvm_make_request(KVM_REQ_EVENT, vcpu);

We don't need anything from KVM_REQ_EVENT and only use it to abort the
VM entry, kvm_vcpu_exiting_guest_mode() is better for that.

>  	} else {
> -		max_irr = kvm_lapic_find_highest_irr(vcpu);
> +		max_irr = prev_max_irr;
>  	}
> +
>  	vmx_hwapic_irr_update(vcpu, max_irr);

We also should just inject the interrupt if L2 is run without
nested_exit_on_intr(), maybe reusing the check in vmx_hwapic_irr_update?

Thanks.