Re: [PATCH v3 11/11] KVM: nVMX: Wake L2 from HLT when nested posted-interrupt pending

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 25 Nov 2020 18:32:36 +0000

-Idan to stop getting bounces.

On Wed, Nov 25, 2020, Paolo Bonzini wrote:
> On 25/11/20 02:14, Sean Christopherson wrote:
> > > The flag
> > > would not have to live past vmx_vcpu_run even, the vIRR[PINV] bit would be
> > > the primary marker that a nested posted interrupt is pending.
> > 
> > 	while (READ_ONCE(vmx->nested.pi_pending) && PID.ON) {
> > 		vmx->nested.pi_pending = false;
> > 		vIRR.PINV = 1;
> > 	}
> > 
> > would incorrectly set vIRR.PINV in the case where hardware handled the PI, and
> > that could result in L1 seeing the interrupt if a nested exit occured before KVM
> > processed vIRR.PINV for L2.  Note, without PID.ON, the behavior would be really
> > bad as KVM would set vIRR.PINV *every* time hardware handled the PINV.
> 
> It doesn't have to be a while loop, since by the time we get here vcpu->mode
> is not IN_GUEST_MODE anymore.

Hrm, bad loop logic on my part.  I'm pretty sure the exiting vCPU needs to wait
for all senders to finish their sequence, otherwise pi_pending could be left
set, but spinning on pi_pending is wrong.  Your generation counter thing may
also work, but that made my brain hurt too much to work through the logic. :-)

Something like this?

static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
						int vector)
{
	struct vcpu_vmx *vmx = to_vmx(vcpu);

	if (is_guest_mode(vcpu) &&
	    vector == vmx->nested.posted_intr_nv) {
		/* Write a comment. */
		vmx->nested.pi_sending_count++;
		smp_wmb();
		if (kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
			vmx->nested.pi_pending = true;
		} else {
			<set PINV in L1 vIRR>
			kvm_make_request(KVM_REQ_EVENT, vcpu);
			kvm_vcpu_kick(vcpu);
		}
		smp_wmb();
		vmx->nested.pi_sending_count--;
		return 0;
	}
	return -1;
}

static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
{
	...

	/* The actual VMENTER/EXIT is in the .noinstr.text section. */
	vmx_vcpu_enter_exit(vcpu, vmx);

	...

	if (is_guest_mode(vcpu) {
		while (READ_ONCE(vmx->nested.pi_sending_count));

		vmx_complete_nested_posted_interrupt(vcpu);
	}

	...
}

> To avoid the double PINV delivery, we could process the PID as in
> vmx_complete_nested_posted_interrupt in this particular case---but
> vmx_complete_nested_posted_interrupt would be moved from vmentry to vmexit,
> and the common case would use vIRR.PINV instead.  There would still be double
> processing, but it would solve the migration problem in a relatively elegant
> manner.

I like this idea, a lot.  I'm a-ok with KVM processing more PIRs than the
SDM may or may not technically allow.

Jim, any objections?