On Thu, Nov 26, 2020, Paolo Bonzini wrote: > On 25/11/20 19:32, Sean Christopherson wrote: > > I'm pretty sure the exiting vCPU needs to wait > > for all senders to finish their sequence, otherwise pi_pending could be left > > set, but spinning on pi_pending is wrong. > > What if you set it before? That doesn't help. nested.pi_pending will be left set, with a valid vIRQ in the PID, after vmx_vcpu_run() if kvm_vcpu_trigger_posted_interrupt() succeeds but the PINV is delivered in the host. Side topic, for the "wait" sequence to work, vmx_vcpu_run() would need to do kvm_vcpu_exiting_guest_mode() prior to waiting for senders to completely their sequence. > > static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu, > int vector) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > > if (is_guest_mode(vcpu) && > vector == vmx->nested.posted_intr_nv) { > /* > * Set pi_pending after ON. > */ > smp_store_release(&vmx->nested.pi_pending, true); > if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) { > /* > * The guest was not running, let's try again > * on the next vmentry. > */ > <set PINV in L1 vIRR> > kvm_make_request(KVM_REQ_EVENT, vcpu); > kvm_vcpu_kick(vcpu); > vmx->nested.pi_pending = false; > } > write_seqcount_end(&vmx->nested.pi_pending_sc); > return 0; > } > return -1; > } > > On top of this: > > - kvm_x86_ops.hwapic_irr_update can be deleted. It is already done > unconditionally by vmx_sync_pir_to_irr before every vmentry. This gives > more freedom in changing vmx_sync_pir_to_irr and vmx_hwapic_irr_update. And would lower the barrier of entry for understanding this code :-) > - VCPU entry must check if max_irr == vmx->nested.posted_intr_nv, and if so > send a POSTED_INTR_NESTED_VECTOR self-IPI. Hmm, there's also this snippet in vmx_sync_pir_to_irr() that needs to be dealt with. If the new max_irr in this case is the nested PI vector, KVM will bail from the run loop instead of continuing on. /* * If we are running L2 and L1 has a new pending interrupt * which can be injected, we should re-evaluate * what should be done with this new L1 interrupt. * If L1 intercepts external-interrupts, we should * exit from L2 to L1. Otherwise, interrupt should be * delivered directly to L2. */ if (is_guest_mode(vcpu) && max_irr_updated) { if (nested_exit_on_intr(vcpu)) kvm_vcpu_exiting_guest_mode(vcpu); else kvm_make_request(KVM_REQ_EVENT, vcpu); } > Combining both (and considering that AMD doesn't do anything interesting in > vmx_sync_pir_to_irr), I would move the whole call to vmx_sync_pir_to_irr > from x86.c to vmx/vmx.c, so that we know that vmx_hwapic_irr_update is > called with interrupts disabled and right before vmentry: > > static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) > { > ... > - vmx_hwapic_irr_update(vcpu, max_irr); > return max_irr; > } > > -static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr) > +static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu) I would also vote to rename this helper; not sure what to call it, but for me the current name doesn't help understand its purpose. > { > + int max_irr; > + > + WARN_ON(!irqs_disabled()); > + max_irr = vmx_sync_pir_to_irr(vcpu); > if (!is_guest_mode(vcpu)) > vmx_set_rvi(max_irr); > + else if (max_irr == vmx->nested.posted_intr_nv) { > + ... > + } > } > > and in vmx_vcpu_run: > > + if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) > + vmx_hwapic_irr_update(vcpu); And also drop the direct call to vmx_sync_pir_to_irr() in the fastpath exit. > If you agree, feel free to send this (without the else of course) as a > separate cleanup patch immediately. Without what "else"?