On Sat, 17 Apr 2021 at 21:09, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > On 16/04/21 05:08, Wanpeng Li wrote: > > From: Wanpeng Li <wanpengli@xxxxxxxxxxx> > > > > Both lock holder vCPU and IPI receiver that has halted are condidate for > > boost. However, the PLE handler was originally designed to deal with the > > lock holder preemption problem. The Intel PLE occurs when the spinlock > > waiter is in kernel mode. This assumption doesn't hold for IPI receiver, > > they can be in either kernel or user mode. the vCPU candidate in user mode > > will not be boosted even if they should respond to IPIs. Some benchmarks > > like pbzip2, swaptions etc do the TLB shootdown in kernel mode and most > > of the time they are running in user mode. It can lead to a large number > > of continuous PLE events because the IPI sender causes PLE events > > repeatedly until the receiver is scheduled while the receiver is not > > candidate for a boost. > > > > This patch boosts the vCPU candidiate in user mode which is delivery > > interrupt. We can observe the speed of pbzip2 improves 10% in 96 vCPUs > > VM in over-subscribe scenario (The host machine is 2 socket, 48 cores, > > 96 HTs Intel CLX box). There is no performance regression for other > > benchmarks like Unixbench spawn (most of the time contend read/write > > lock in kernel mode), ebizzy (most of the time contend read/write sem > > and TLB shoodtdown in kernel mode). > > > > +bool kvm_arch_interrupt_delivery(struct kvm_vcpu *vcpu) > > +{ > > + if (vcpu->arch.apicv_active && static_call(kvm_x86_dy_apicv_has_pending_interrupt)(vcpu)) > > + return true; > > + > > + return false; > > +} > > Can you reuse vcpu_dy_runnable instead of this new function? I have some concerns. For x86 arch, vcpu_dy_runnable() will add extra vCPU candidates by KVM_REQ_EVENT and async pf(which has already opportunistically made the guest do other stuff). For other arches, kvm_arch_dy_runnale() is equal to kvm_arch_vcpu_runnable() except powerpc which has too many events and is not conservative. In general, vcpu_dy_runnable() will loose the conditions and add more vCPU candidates. Wanpeng