-Idan to stop getting bounces. On Wed, Nov 25, 2020, Paolo Bonzini wrote: > On 25/11/20 02:14, Sean Christopherson wrote: > > > The flag > > > would not have to live past vmx_vcpu_run even, the vIRR[PINV] bit would be > > > the primary marker that a nested posted interrupt is pending. > > > > while (READ_ONCE(vmx->nested.pi_pending) && PID.ON) { > > vmx->nested.pi_pending = false; > > vIRR.PINV = 1; > > } > > > > would incorrectly set vIRR.PINV in the case where hardware handled the PI, and > > that could result in L1 seeing the interrupt if a nested exit occured before KVM > > processed vIRR.PINV for L2. Note, without PID.ON, the behavior would be really > > bad as KVM would set vIRR.PINV *every* time hardware handled the PINV. > > It doesn't have to be a while loop, since by the time we get here vcpu->mode > is not IN_GUEST_MODE anymore. Hrm, bad loop logic on my part. I'm pretty sure the exiting vCPU needs to wait for all senders to finish their sequence, otherwise pi_pending could be left set, but spinning on pi_pending is wrong. Your generation counter thing may also work, but that made my brain hurt too much to work through the logic. :-) Something like this? static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu, int vector) { struct vcpu_vmx *vmx = to_vmx(vcpu); if (is_guest_mode(vcpu) && vector == vmx->nested.posted_intr_nv) { /* Write a comment. */ vmx->nested.pi_sending_count++; smp_wmb(); if (kvm_vcpu_trigger_posted_interrupt(vcpu, true)) { vmx->nested.pi_pending = true; } else { <set PINV in L1 vIRR> kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_vcpu_kick(vcpu); } smp_wmb(); vmx->nested.pi_sending_count--; return 0; } return -1; } static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) { ... /* The actual VMENTER/EXIT is in the .noinstr.text section. */ vmx_vcpu_enter_exit(vcpu, vmx); ... if (is_guest_mode(vcpu) { while (READ_ONCE(vmx->nested.pi_sending_count)); vmx_complete_nested_posted_interrupt(vcpu); } ... } > To avoid the double PINV delivery, we could process the PID as in > vmx_complete_nested_posted_interrupt in this particular case---but > vmx_complete_nested_posted_interrupt would be moved from vmentry to vmexit, > and the common case would use vIRR.PINV instead. There would still be double > processing, but it would solve the migration problem in a relatively elegant > manner. I like this idea, a lot. I'm a-ok with KVM processing more PIRs than the SDM may or may not technically allow. Jim, any objections?