Re: [PATCH v3 11/11] KVM: nVMX: Wake L2 from HLT when nested posted-interrupt pending

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Wed, 25 Nov 2020 18:00:31 +0100

On 25/11/20 02:14, Sean Christopherson wrote:
The flag
would not have to live past vmx_vcpu_run even, the vIRR[PINV] bit would be
the primary marker that a nested posted interrupt is pending.

	while (READ_ONCE(vmx->nested.pi_pending) && PID.ON) {
		vmx->nested.pi_pending = false;
		vIRR.PINV = 1;
	}

would incorrectly set vIRR.PINV in the case where hardware handled the PI, and
that could result in L1 seeing the interrupt if a nested exit occured before KVM
processed vIRR.PINV for L2.  Note, without PID.ON, the behavior would be really
bad as KVM would set vIRR.PINV *every* time hardware handled the PINV.

It doesn't have to be a while loop, since by the time we get here 
vcpu->mode is not IN_GUEST_MODE anymore.  To avoid the double PINV 
delivery, we could process the PID as in 
vmx_complete_nested_posted_interrupt in this particular case---but 
vmx_complete_nested_posted_interrupt would be moved from vmentry to 
vmexit, and the common case would use vIRR.PINV instead.  There would 
still be double processing, but it would solve the migration problem in 
a relatively elegant manner.

The weird promise is
that the PINV interrupt is the _only_ trigger for posted interrupts.

Ah, I misunderstood the original "only".  I suspect the primary reason is that
it would cost uops to do the snoop thing and would be inefficient in practice.

Yes, I agree.  But again, the spec seems to be unnecessarily restrictive.

This is the part that is likely impossible to
solve without shadowing the PID (which, for the record, I have zero desire to do).
Neither do I.:)   But technically the SDM doesn't promise reading the whole
256 bits at the same time.

Hrm, the wording is poor, but my interpretation of this blurb is that the CPU
somehow has a death grip on the PID cache line while it's reading and clearing
the PIR.

   5. The logical processor performs a logical-OR of PIR into VIRR and clears PIR.
      No other agent can read or write a PIR bit (or group of bits) between the
      time it is read (to determine what to OR into VIRR) and when it is cleared.

Yeah, that's the part I interpreted as other processors possibly being 
able to see a partially updated version.  Of course in practice the 
processor will be doing everything atomically, but the more restrictive 
reading of the spec all but precludes a software implementation.

Paolo