On Mon, Sep 26, 2022 at 04:19:52PM +0000, Sean Christopherson wrote: >On Fri, Sep 23, 2022, Chao Gao wrote: >> Set PID.SN right after VM exits and clear it before VM entry to minimize >> the chance of hardware issuing PINs to a CPU when it's in host. > >Toggling PID.SN as close to the world switch as possible is undesirable. If KVM >re-enters the guest without enabling IRQs, i.e. handles the VM-Exit in the fastpath, >then the notification IRQ will be delivered in the guest. > >The natural location to do the toggling is when KVM "toggles" software, i.e. when >KVM sets IN_GUEST_MODE (clear SN) and OUTSIDE_GUEST_MODE (set SN). This makes sense to me. > >I believe that would also obviate the need to manually send a PI Notification IRQ, >as the existing ->sync_pir_to_irr() call that exists to handle exactly this case >(notification not sent or handled in host) would ensure any outstanding posted IRQ >gets moved to the IRR and processed accordingly. > >> Opportunistically clean up vmx_vcpu_pi_put(); when a vCPU is preempted, > >Uh uh, this patch is already way, way too subtle and complex to tack on clean up. >"Opportunistic" clean up is for cases where the clean up is a pure refactoring >and/or has zero impact on functionality. Got it. Will move this cleanup to a separate patch if it is still needed. > >> it is pointless to update PID.NV to wakeup vector since notification is >> anyway suppressed. And since PID.SN should be already set for running >> vCPUs, so, don't set it again for preempted vCPUs. > >I'm pretty sure this is wrong. If the vCPU is preempted between prepare_to_rcuwait() >and schedule(), then skipping pi_enable_wakeup_handler() will hang the guest if >the wakeup event is a posted IRQ and the event arrives while the vCPU is preempted. Thanks for pointing out this subtle case. My understanding is finally there will be a call of vmx_vcpu_pi_put() with preempted=false (I assume that preempted vCPUs will be scheduled at some later point). In that case, pi_enable_wakeup_handler() can wake up the vCPU by sending a self-ipi. Plus this patch checks PIR instead of ON bit, I don't get why the guest will hang. > >> When IPI virtualization is enabled, this patch increases "perf bench" [*] >> by 6.56%, and PIN count in 1 second drops from tens of thousands to >> hundreds. But cpuid loop test shows this patch causes 1.58% overhead in >> VM-exit round-trip latency. > >The overhead is more than likely due to pi_is_pir_empty() in the VM-Entry path, >i.e. should in theory go away if PID.SN is clear/set at IN_GUEST_MODE and >OUTSIDE_GUEST_MODE I will collect perf data after implementing what you suggested. > >> Also honour PID.SN bit in vmx_deliver_posted_interrupt(). > >Why? VT-d hardware doesn't set ON bit if SN bit is set. Enforce the same rule in KVM can skip unnecessary work, like the following pi_test_and_set_on() and kvm_vcpu_trigger_posted_interrupt(). > >> When IPI virtualization is enabled, this patch increases "perf bench" [*] >> by 6.56%, and PIN count in 1 second drops from tens of thousands to >> hundreds. But cpuid loop test shows this patch causes 1.58% overhead in >> VM-exit round-trip latency. >> >> [*] test cmd: perf bench sched pipe -T. Note that we change the source >> code to pin two threads to two different vCPUs so that it can reproduce >> stable results. >> >> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx> >> --- >> RFC: I am not sure whether the benefits outweighs the extra VM-exit cost. >> >> Changes in v2 (addressed comments from Kevin): >> - measure/estimate the impact to non-IPC-intensive cases >> - don't tie PID.SN to vcpu->mode. Instead, clear PID.SN >> right before VM-entry and set it after VM-exit. > >Ah, sorry, missed v1. Rather than key off of IN_GUEST_MODE in the sync path, add >an explicit kvm_x86_ops hook to perform the transition. I.e. make it explict. It is ok to add a separate hook. But the question is how to coordinate clearing SN with ->sync_pir_to_irr(). Clearing SN bit may put PIR in a state where ON/SN are cleared but some outstanding IRQs left there. Current ->sync_pir_to_irr() doesn't sync those IRQs to IRR in this case. Here are two options to fix the problem: 1) clear SN with the new hook, then set ON bit if there is any outstanding IRQ. No change to ->sync_pir_to_irr() is needed. 2) clear SN with the new hook, add a force mode to ->sync_pir_to_irr() where PIR is synced to IRR regardless of ON/SN bits, inovke ->sync_pir_to_irr() on VM-entry path with force_mode=true. Both may lead to an extra check of PIR. >> @@ -101,11 +95,16 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) >> new.control = old.control; >> >> /* >> - * Clear SN (as above) and refresh the destination APIC ID to >> - * handle task migration (@cpu != vcpu->cpu). >> + * Set SN and refresh the destination APIC ID to handle >> + * task migration (@cpu != vcpu->cpu). >> + * >> + * SN is cleared when a vCPU goes to blocked state so that >> + * the blocked vCPU can be waken up on receiving a >> + * notification. For a running/runnable vCPU, such >> + * notifications are useless. Set SN bit to suppress them. >> */ >> new.ndst = dest; >> - new.sn = 0; >> + new.sn = 1; > >To handle the preempted case, I believe the correct behavior is to leave SN >as-is, although that would require setting SN=1 during vCPU creation. Arguably >KVM should do that anyways when APICv is enabled. > >Hmm, or alternatively this should do the same? > > new.sn = !kvm_vcpu_is_blocking(); I don't get this. Probably I am misunderstanding something about vCPU preemption. > >> @@ -172,8 +160,10 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu) >> * enabled until it is safe to call try_to_wake_up() on the task being >> * scheduled out). >> */ >> - if (pi_test_on(&new)) >> + if (!pi_is_pir_empty(pi_desc)) { >> + pi_set_on(pi_desc); > >As much as I wish we could get rid of kvm_arch_vcpu_blocking(), I actually think >this would be a good application of that hook. If PID.SN is cleared during >kvm_arch_vcpu_blocking() and set during kvm_arch_vcpu_unblocking(), then I believe >there's no need to manually check the PIR here, as any IRQ that isn't detected by >kvm_vcpu_check_block() is guaranteed to set PID.ON=1. Using kvm_arch_vcpu_blocking() has the same problem as using a new hook for the VM-entry path: we need a force mode for ->sync_pir_to_irr() or set ON bit if there is any outstanding IRQ right after clearing SN The former may help performance a little but since the call of ->sync_pir_to_irr() in kvm_vcpu_check_block() is so far away from the place where SN is cleared, I think this would be a source of bugs. The latter has no benefit compared to what this patch does here.