Wanpeng Li <kernellwp@xxxxxxxxx> writes: > From: Wanpeng Li <wanpengli@xxxxxxxxxxx> > > The overhead of kvm_vcpu_kick() is huge since expensive rcu/memory > barrier etc operations in rcuwait_wake_up(). It is worse when local > delivery since the vCPU is scheduled and we still suffer from this. > We can observe 12us+ for kvm_vcpu_kick() in kvm_pmu_deliver_pmi() > path by ftrace before the patch and 6us+ after the optimization. > > Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx> > --- > arch/x86/kvm/lapic.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 76fb00921203..ec6997187c6d 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1120,7 +1120,8 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, > case APIC_DM_NMI: > result = 1; > kvm_inject_nmi(vcpu); > - kvm_vcpu_kick(vcpu); > + if (vcpu != kvm_get_running_vcpu()) > + kvm_vcpu_kick(vcpu); Out of curiosity, can this be converted into a generic optimization for kvm_vcpu_kick() instead? I.e. if kvm_vcpu_kick() is called for the currently running vCPU, there's almost nothing to do, especially when we already have a request pending, right? (I didn't put too much though to it) > break; > > case APIC_DM_INIT: -- Vitaly