Re: [RFC PATCH 0/2] Mitigating Excessive Pause-Loop Exiting in VM-Agnostic KVM

Wanpeng Li <kernellwp@xxxxxxxxx> · Thu, 22 Apr 2021 20:21:47 +0800

On Thu, 22 Apr 2021 at 09:45, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Apr 22, 2021, Kenta Ishiguro wrote:
> > To solve problems (2) and (3), patch 2 monitors IPI communication between
> > vCPUs and leverages the relationship between vCPUs to select boost
> > candidates.  The "[PATCH] KVM: Boost vCPU candidiate in user mode which is
> > delivering interrupt" patch
> > (https://lore.kernel.org/kvm/CANRm+Cy-78UnrkX8nh5WdHut2WW5NU=UL84FRJnUNjsAPK+Uww@xxxxxxxxxxxxxx/T/)
> > seems to be effective for (2) while it only uses the IPI receiver
> > information.
>
> On the IPI side of thing, I like the idea of explicitly tracking the IPIs,
> especially if we can simplify the implementation, e.g. by losing the receiver
> info and making ipi_received a bool.  Maybe temporarily table Wanpeng's patch
> while this approach is analyzed?

Hi all,

I evaluate my patch
(https://lore.kernel.org/kvm/1618542490-14756-1-git-send-email-wanpengli@xxxxxxxxxxx),
Kenta's patch 2 and Sean's suggestion. The testing environment is
pbzip2 in 96 vCPUs VM in over-subscribe scenario (The host machine is
2 socket, 48 cores, 96 HTs Intel CLX box). Note: the Kenta's scheduler
hacking is not applied. The score of my patch is the most stable and
the best performance.

Wanpeng's patch

The average: vanilla -> boost: 69.124 -> 61.975, 10.3%

* Wall Clock: 61.695359 seconds
* Wall Clock: 63.343579 seconds
* Wall Clock: 61.567513 seconds
* Wall Clock: 62.144722 seconds
* Wall Clock: 61.091442 seconds
* Wall Clock: 62.085912 seconds
* Wall Clock: 61.311954 seconds

Kenta' patch

The average: vanilla -> boost: 69.148 -> 64.567, 6.6%

* Wall Clock:  66.288113 seconds
* Wall Clock:  61.228642 seconds
* Wall Clock:  62.100524 seconds
* Wall Clock:  68.355473 seconds
* Wall Clock:  64.864608 seconds

Sean's suggestion:

The average: vanilla -> boost: 69.148 -> 66.505, 3.8%

* Wall Clock: 60.583562 seconds
* Wall Clock: 58.533960 seconds
* Wall Clock: 70.103489 seconds
* Wall Clock: 74.279028 seconds
* Wall Clock: 69.024194 seconds

I follow(almost) Sean's suggestion:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0050f39..78b5eb6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1272,6 +1272,7 @@ EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
 void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
 {
     struct kvm_lapic_irq irq;
+    struct kvm_vcpu *dest_vcpu;

     irq.vector = icr_low & APIC_VECTOR_MASK;
     irq.delivery_mode = icr_low & APIC_MODE_MASK;
@@ -1285,6 +1286,10 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic,
u32 icr_low, u32 icr_high)
     else
         irq.dest_id = GET_APIC_DEST_FIELD(icr_high);

+    dest_vcpu = kvm_get_vcpu_by_id(apic->vcpu->kvm, irq.dest_id);
+    if (dest_vcpu)
+        WRITE_ONCE(dest_vcpu->ipi_received, true);
+
     trace_kvm_apic_ipi(icr_low, irq.dest_id);

     kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 303fb55..a98bf571 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9298,6 +9298,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
     if (test_thread_flag(TIF_NEED_FPU_LOAD))
         switch_fpu_return();

+    WRITE_ONCE(vcpu->ipi_received, false);
+
     if (unlikely(vcpu->arch.switch_db_regs)) {
         set_debugreg(0, 7);
         set_debugreg(vcpu->arch.eff_db[0], 0);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5ef09a4..81e39fa 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -332,6 +332,8 @@ struct kvm_vcpu {
         bool dy_eligible;
     } spin_loop;
 #endif
+
+    bool ipi_received;
     bool preempted;
     bool ready;
     struct kvm_vcpu_arch arch;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c682f82..5098929 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -411,6 +411,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu,
struct kvm *kvm, unsigned id)

     kvm_vcpu_set_in_spin_loop(vcpu, false);
     kvm_vcpu_set_dy_eligible(vcpu, false);
+    vcpu->ipi_received = false;
     vcpu->preempted = false;
     vcpu->ready = false;
     preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
@@ -3220,6 +3221,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool
yield_to_kernel_mode)
                 !vcpu_dy_runnable(vcpu))
                 continue;
             if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
+                !READ_ONCE(vcpu->ipi_received) &&
                 !kvm_arch_vcpu_in_kernel(vcpu))
                 continue;
             if (!kvm_vcpu_eligible_for_directed_yield(vcpu))