Re: [PATCH v5 0/4] KVM: X86: Paravirt remote TLB flush

Wanpeng Li <kernellwp@xxxxxxxxx> · Thu, 16 Nov 2017 18:21:54 +0800

2017-11-16 5:05 GMT+08:00 Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>:
> On Mon, Nov 13, 2017 at 02:01:16AM -0800, Wanpeng Li wrote:
>> Remote flushing api's does a busy wait which is fine in bare-metal
>> scenario. But with-in the guest, the vcpus might have been pre-empted
>> or blocked. In this scenario, the initator vcpu would end up
>> busy-waiting for a long amount of time.
>>
>> This patch set implements para-virt flush tlbs making sure that it
>> does not wait for vcpus that are sleeping. And all the sleeping vcpus
>> flush the tlb on guest enter. Idea was discussed here:
>> https://lkml.org/lkml/2012/2/20/157
>>
>> The best result is achieved when we're overcommiting the host by running
>> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
>> vCPUs which are not scheduled and avoid the wait on the main CPU.
>>
>> In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based
>> page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
>>
>> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy
>> in one linux guest.
>
> 8 pCPUS?
>>
>> ebizzy -M
>>               vanilla    optimized     boost
>>  8 vCPUs       10152       10083       -0.68%
>> 16 vCPUs        1224        4866       297.5%
>> 24 vCPUs        1109        3871       249%
>> 32 vCPUs        1025        3375       229.3%
>
> so this is all just one guest? What happens if you have say a 64pCPU
> machine with eight of these guests? That is more of a realistic
> workload in todays cloud situations.

Yeah, testing on a Xeon Gold 6142 2.6GHz 2 socket, each 16 cores (each
2 HTs), so 64 pCPUs, and each VM is 64 vCPUs.

           vanilla     optimized       boost
1VM    46799      46788            -0.01%
2VM    23962      42691            78%
3VM    16152      37539            132%

Regards,
Wanpeng Li

>
>>
>> Note: The patchset is rebased against "locking/qspinlock/x86: Avoid
>>    test-and-set when PV_DEDICATED is set" v3
>>
>> v4 -> v5:
>>  * flushmask instead of cpumask
>>
>> v3 -> v4:
>>  * use READ_ONCE()
>>  * use try_cmpxchg instead of cmpxchg
>>  * add {} to if
>>  * no FLUSH flags to preserve during set_preempted
>>  * "KVM: X86" prefix to patch subject
>>
>> v2 -> v3:
>>  * percpu cpumask
>>
>> v1 -> v2:
>>  * a new CPUID feature bit
>>  * fix cmpxchg check
>>  * use kvm_vcpu_flush_tlb() to get the statistics right
>>  * just OR the KVM_VCPU_PREEMPTED in kvm_steal_time_set_preempted
>>  * add a new bool argument to kvm_x86_ops->tlb_flush
>>  * __cpumask_clear_cpu() instead of cpumask_clear_cpu()
>>  * not put cpumask_t on stack
>>  * rebase the patchset against "locking/qspinlock/x86: Avoid
>>    test-and-set when PV_DEDICATED is set" v3
>>
>> Wanpeng Li (4):
>>   KVM: X86: Add vCPU running/preempted state
>>   KVM: X86: Add paravirt remote TLB flush
>>   KVM: X86: introduce invalidate_gpa argument to tlb flush
>>   KVM: X86: Add flush_on_enter before guest enter
>>
>>  Documentation/virtual/kvm/cpuid.txt  |  4 ++++
>>  arch/x86/include/asm/kvm_host.h      |  2 +-
>>  arch/x86/include/uapi/asm/kvm_para.h |  6 +++++
>>  arch/x86/kernel/kvm.c                | 46 ++++++++++++++++++++++++++++++++++--
>>  arch/x86/kvm/cpuid.c                 |  3 ++-
>>  arch/x86/kvm/svm.c                   | 14 +++++------
>>  arch/x86/kvm/vmx.c                   | 21 ++++++++--------
>>  arch/x86/kvm/x86.c                   | 25 +++++++++++++-------
>>  8 files changed, 88 insertions(+), 30 deletions(-)
>>
>> --
>> 2.7.4
>>