Remote flushing api's does a busy wait which is fine in bare-metal scenario. But with-in the guest, the vcpus might have been pre-empted or blocked. In this scenario, the initator vcpu would end up busy-waiting for a long amount of time. This patch set implements para-virt flush tlbs making sure that it does not wait for vcpus that are sleeping. And all the sleeping vcpus flush the tlb on guest enter. Idea was discussed here: https://lkml.org/lkml/2012/2/20/157 The best result is achieved when we're overcommiting the host by running multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching vCPUs which are not scheduled and avoid the wait on the main CPU. In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)") Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in one linux guest. ebizzy -M vanilla optimized boost 8 vCPUs 10152 10083 -0.68% 16 vCPUs 1224 4866 297.5% 24 vCPUs 1109 3871 249% 32 vCPUs 1025 3375 229.3% Wanpeng Li (4): KVM: Add vCPU running/preempted state KVM: Add paravirt remote TLB flush KVM: X86: introduce invalidate_gpa argument to tlb flush KVM: Add flush_on_enter before guest enter Documentation/virtual/kvm/cpuid.txt | 10 ++++++++++ arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/include/uapi/asm/kvm_para.h | 6 ++++++ arch/x86/kernel/kvm.c | 35 ++++++++++++++++++++++++++++++++++- arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/svm.c | 14 +++++++------- arch/x86/kvm/vmx.c | 21 +++++++++++---------- arch/x86/kvm/x86.c | 24 +++++++++++++++--------- 8 files changed, 86 insertions(+), 29 deletions(-) -- 2.7.4