On Thu, 22 Apr 2021 at 09:45, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Thu, Apr 22, 2021, Kenta Ishiguro wrote: > > To solve problems (2) and (3), patch 2 monitors IPI communication between > > vCPUs and leverages the relationship between vCPUs to select boost > > candidates. The "[PATCH] KVM: Boost vCPU candidiate in user mode which is > > delivering interrupt" patch > > (https://lore.kernel.org/kvm/CANRm+Cy-78UnrkX8nh5WdHut2WW5NU=UL84FRJnUNjsAPK+Uww@xxxxxxxxxxxxxx/T/) > > seems to be effective for (2) while it only uses the IPI receiver > > information. > > On the IPI side of thing, I like the idea of explicitly tracking the IPIs, > especially if we can simplify the implementation, e.g. by losing the receiver > info and making ipi_received a bool. Maybe temporarily table Wanpeng's patch > while this approach is analyzed? Hi all, I evaluate my patch (https://lore.kernel.org/kvm/1618542490-14756-1-git-send-email-wanpengli@xxxxxxxxxxx), Kenta's patch 2 and Sean's suggestion. The testing environment is pbzip2 in 96 vCPUs VM in over-subscribe scenario (The host machine is 2 socket, 48 cores, 96 HTs Intel CLX box). Note: the Kenta's scheduler hacking is not applied. The score of my patch is the most stable and the best performance. Wanpeng's patch The average: vanilla -> boost: 69.124 -> 61.975, 10.3% * Wall Clock: 61.695359 seconds * Wall Clock: 63.343579 seconds * Wall Clock: 61.567513 seconds * Wall Clock: 62.144722 seconds * Wall Clock: 61.091442 seconds * Wall Clock: 62.085912 seconds * Wall Clock: 61.311954 seconds Kenta' patch The average: vanilla -> boost: 69.148 -> 64.567, 6.6% * Wall Clock: 66.288113 seconds * Wall Clock: 61.228642 seconds * Wall Clock: 62.100524 seconds * Wall Clock: 68.355473 seconds * Wall Clock: 64.864608 seconds Sean's suggestion: The average: vanilla -> boost: 69.148 -> 66.505, 3.8% * Wall Clock: 60.583562 seconds * Wall Clock: 58.533960 seconds * Wall Clock: 70.103489 seconds * Wall Clock: 74.279028 seconds * Wall Clock: 69.024194 seconds I follow(almost) Sean's suggestion: diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0050f39..78b5eb6 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1272,6 +1272,7 @@ EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated); void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high) { struct kvm_lapic_irq irq; + struct kvm_vcpu *dest_vcpu; irq.vector = icr_low & APIC_VECTOR_MASK; irq.delivery_mode = icr_low & APIC_MODE_MASK; @@ -1285,6 +1286,10 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high) else irq.dest_id = GET_APIC_DEST_FIELD(icr_high); + dest_vcpu = kvm_get_vcpu_by_id(apic->vcpu->kvm, irq.dest_id); + if (dest_vcpu) + WRITE_ONCE(dest_vcpu->ipi_received, true); + trace_kvm_apic_ipi(icr_low, irq.dest_id); kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 303fb55..a98bf571 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9298,6 +9298,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (test_thread_flag(TIF_NEED_FPU_LOAD)) switch_fpu_return(); + WRITE_ONCE(vcpu->ipi_received, false); + if (unlikely(vcpu->arch.switch_db_regs)) { set_debugreg(0, 7); set_debugreg(vcpu->arch.eff_db[0], 0); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5ef09a4..81e39fa 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -332,6 +332,8 @@ struct kvm_vcpu { bool dy_eligible; } spin_loop; #endif + + bool ipi_received; bool preempted; bool ready; struct kvm_vcpu_arch arch; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c682f82..5098929 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -411,6 +411,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) kvm_vcpu_set_in_spin_loop(vcpu, false); kvm_vcpu_set_dy_eligible(vcpu, false); + vcpu->ipi_received = false; vcpu->preempted = false; vcpu->ready = false; preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops); @@ -3220,6 +3221,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) !vcpu_dy_runnable(vcpu)) continue; if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode && + !READ_ONCE(vcpu->ipi_received) && !kvm_arch_vcpu_in_kernel(vcpu)) continue; if (!kvm_vcpu_eligible_for_directed_yield(vcpu))