Re: [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown

Wanpeng Li <kernellwp@xxxxxxxxx> · Wed, 13 Dec 2017 17:42:14 +0800



Paolo, is it ok to apply this version? :)
2017-12-13 9:33 GMT+08:00 Wanpeng Li <kernellwp@xxxxxxxxx>:
> Remote flushing api's does a busy wait which is fine in bare-metal
> scenario. But with-in the guest, the vcpus might have been pre-empted
> or blocked. In this scenario, the initator vcpu would end up
> busy-waiting for a long amount of time.
>
> This patch set implements para-virt flush tlbs making sure that it
> does not wait for vcpus that are sleeping. And all the sleeping vcpus
> flush the tlb on guest enter. Idea was discussed here:
> https://lkml.org/lkml/2012/2/20/157
>
> The best result is achieved when we're overcommiting the host by running
> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
> vCPUs which are not scheduled and avoid the wait on the main CPU.
>
> In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based
> page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
>
> Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
> so 64 pCPUs, and each VM is 64 vCPUs.
>
> ebizzy -M
>               vanilla    optimized     boost
> 1VM            46799       48670         4%
> 2VM            23962       42691        78%
> 3VM            16152       37539       132%
>
> Note: The patchset is not rebased against "locking/qspinlock/x86: Avoid
>    test-and-set when PV_DEDICATED is set" v3 since I can still observe a
>    little improvement for 64 vCPUs on 64 pCPUs, it is due to the system
>    is not completely isolated, there are many housekeeping tasks work
>    sporadically, and vCPUs are preemted some times, I also confirm this
>    when adding some print to the kvm_flush_tlb_others. After PV_DEDICATED
>    is merged, we can disable pv tlb flush when not overcommiting if it
>    is needed.
>
> v7 -> v8:
>  * rebase against latest kvm/queue
>
> v6 -> v7:
>  * don't check !flushmask
>  * use arch_initcall() to achieve late allocate percpu mask
>
> v5 -> v6:
>  * fix the percpu mask
>  * rebase against latest kvm/queue
>
> v4 -> v5:
>  * flushmask instead of cpumask
>
> v3 -> v4:
>  * use READ_ONCE()
>  * use try_cmpxchg instead of cmpxchg
>  * add {} to if
>  * no FLUSH flags to preserve during set_preempted
>  * "KVM: X86" prefix to patch subject
>
> v2 -> v3:
>  * percpu cpumask
>
> v1 -> v2:
>  * a new CPUID feature bit
>  * fix cmpxchg check
>  * use kvm_vcpu_flush_tlb() to get the statistics right
>  * just OR the KVM_VCPU_PREEMPTED in kvm_steal_time_set_preempted
>  * add a new bool argument to kvm_x86_ops->tlb_flush
>  * __cpumask_clear_cpu() instead of cpumask_clear_cpu()
>  * not put cpumask_t on stack
>  * rebase the patchset against "locking/qspinlock/x86: Avoid
>    test-and-set when PV_DEDICATED is set" v3
>
>
> Wanpeng Li (4):
>   KVM: X86: Add vCPU running/preempted state
>   KVM: X86: Add Paravirt TLB Shootdown
>   KVM: X86: introduce invalidate_gpa argument to tlb flush
>   KVM: X86: Add flush_on_enter before guest enter
>
>  Documentation/virtual/kvm/cpuid.txt  |  4 +++
>  arch/x86/include/asm/kvm_host.h      |  2 +-
>  arch/x86/include/uapi/asm/kvm_para.h |  5 ++++
>  arch/x86/kernel/kvm.c                | 49 +++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/cpuid.c                 |  3 ++-
>  arch/x86/kvm/svm.c                   | 14 +++++------
>  arch/x86/kvm/vmx.c                   | 21 ++++++++--------
>  arch/x86/kvm/x86.c                   | 25 +++++++++++-------
>  8 files changed, 94 insertions(+), 29 deletions(-)
>
> --
> 2.7.4
>