Re: [PATCH v3 11/11] KVM: nVMX: Wake L2 from HLT when nested posted-interrupt pending

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 30 Nov 2020 19:14:59 +0000

On Thu, Nov 26, 2020, Paolo Bonzini wrote:
> On 25/11/20 19:32, Sean Christopherson wrote:
> > I'm pretty sure the exiting vCPU needs to wait
> > for all senders to finish their sequence, otherwise pi_pending could be left
> > set, but spinning on pi_pending is wrong.
> 
> What if you set it before?

That doesn't help.  nested.pi_pending will be left set, with a valid vIRQ in the
PID, after vmx_vcpu_run() if kvm_vcpu_trigger_posted_interrupt() succeeds but
the PINV is delivered in the host.

Side topic, for the "wait" sequence to work, vmx_vcpu_run() would need to do
kvm_vcpu_exiting_guest_mode() prior to waiting for senders to completely their
sequence.

> 
> static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
> 						int vector)
> {
> 	struct vcpu_vmx *vmx = to_vmx(vcpu);
> 
> 	if (is_guest_mode(vcpu) &&
> 	    vector == vmx->nested.posted_intr_nv) {
> 		/*
> 		 * Set pi_pending after ON.
> 		 */
> 		smp_store_release(&vmx->nested.pi_pending, true);
> 		if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
> 			/*
> 			 * The guest was not running, let's try again
> 			 * on the next vmentry.
> 			 */
> 			<set PINV in L1 vIRR>
> 			kvm_make_request(KVM_REQ_EVENT, vcpu);
> 			kvm_vcpu_kick(vcpu);
> 			vmx->nested.pi_pending = false;
> 		}
> 		write_seqcount_end(&vmx->nested.pi_pending_sc);
> 		return 0;
> 	}
> 	return -1;
> }
> 
> On top of this:
> 
> - kvm_x86_ops.hwapic_irr_update can be deleted.  It is already done
> unconditionally by vmx_sync_pir_to_irr before every vmentry.  This gives
> more freedom in changing vmx_sync_pir_to_irr and vmx_hwapic_irr_update.

And would lower the barrier of entry for understanding this code :-)

> - VCPU entry must check if max_irr == vmx->nested.posted_intr_nv, and if so
> send a POSTED_INTR_NESTED_VECTOR self-IPI.

Hmm, there's also this snippet in vmx_sync_pir_to_irr() that needs to be dealt
with.  If the new max_irr in this case is the nested PI vector, KVM will bail
from the run loop instead of continuing on. 

	/*
	 * If we are running L2 and L1 has a new pending interrupt
	 * which can be injected, we should re-evaluate
	 * what should be done with this new L1 interrupt.
	 * If L1 intercepts external-interrupts, we should
	 * exit from L2 to L1. Otherwise, interrupt should be
	 * delivered directly to L2.
	 */
	if (is_guest_mode(vcpu) && max_irr_updated) {
		if (nested_exit_on_intr(vcpu))
			kvm_vcpu_exiting_guest_mode(vcpu);
		else
			kvm_make_request(KVM_REQ_EVENT, vcpu);
	}

> Combining both (and considering that AMD doesn't do anything interesting in
> vmx_sync_pir_to_irr), I would move the whole call to vmx_sync_pir_to_irr
> from x86.c to vmx/vmx.c, so that we know that vmx_hwapic_irr_update is
> called with interrupts disabled and right before vmentry:
> 
>  static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>  {
> 	...
> -	vmx_hwapic_irr_update(vcpu, max_irr);
>         return max_irr;
>  }
> 
> -static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
> +static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu)

I would also vote to rename this helper; not sure what to call it, but for me
the current name doesn't help understand its purpose.

>  {
> +	int max_irr;
> +
> +	WARN_ON(!irqs_disabled());
> +	max_irr = vmx_sync_pir_to_irr(vcpu);
>         if (!is_guest_mode(vcpu))
>                 vmx_set_rvi(max_irr);
> +	else if (max_irr == vmx->nested.posted_intr_nv) {
> +		...
> +	}
>  }
> 
> and in vmx_vcpu_run:
> 
> +	if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
> +		vmx_hwapic_irr_update(vcpu);

And also drop the direct call to vmx_sync_pir_to_irr() in the fastpath exit.

> If you agree, feel free to send this (without the else of course) as a
> separate cleanup patch immediately.

Without what "else"?