On Tue, Aug 21, 2012 at 04:55:52PM +0530, Nikunj A. Dadhania wrote: > Remote flushing api's does a busy wait which is fine in bare-metal > scenario. But with-in the guest, the vcpus might have been pre-empted > or blocked. In this scenario, the initator vcpu would end up > busy-waiting for a long amount of time. > > This was discovered in our gang scheduling test and other way to solve > this is by para-virtualizing the flush_tlb_others_ipi(now shows up as > smp_call_function_many after Alex Shi's TLB optimization) > > This patch set implements para-virt flush tlbs making sure that it > does not wait for vcpus that are sleeping. And all the sleeping vcpus > flush the tlb on guest enter. Idea was discussed here: > https://lkml.org/lkml/2012/2/20/157 > > This also brings one more dependency for lock-less page walk that is > performed by get_user_pages_fast(gup_fast). gup_fast disables the > interrupt and assumes that the pages will not be freed during that > period. And this was fine as the flush_tlb_others_ipi would wait for > all the IPI to be processed and return back. With the new approach of > not waiting for the sleeping vcpus, this assumption is not valid > anymore. So now HAVE_RCU_TABLE_FREE is used to free the pages. This > will make sure that all the cpus would atleast process smp_callback > before the pages are freed. > > Changelog from v3: > • Add helper for cleaning up vcpu_state information (Marcelo) > • Fix code for checking vs_page and leaking page refs (Marcelo) > > Changelog from v2: > • Rebase to 3.5 based linus(commit - f7da9cd) kernel. > • Port PV-Flush to new TLB-Optimization code by Alex Shi > • Use pinned pages to avoid overhead during guest enter/exit (Marcelo) > • Remove kick, as this is not improving much > • Use bit fields in the state(flush_on_enter and vcpu_running) flag to > avoid smp barriers (Marcelo) > > Changelog from v1: > • Race fixes reported by Vatsa > • Address gup_fast dependency using PeterZ's rcu table free patch > • Fix rcu_table_free for hw pagetable walkers > > Here are the results from PLE hardware. Here is the setup details: > • 32 CPUs (HT disabled) > • 64-bit VM > • 32vcpus > • 8GB RAM > > base = 3.6-rc1 + ple handler optimization patch > pvflushv4 = 3.6-rc1 + ple handler optimization patch + pvflushv4 patch > > kernbench(lower is better) > ========================== > base pvflushv4 %improvement > 1VM 48.5800 46.8513 3.55846 > 2VM 108.1823 104.6410 3.27346 > 3VM 183.2733 163.3547 10.86825 > > ebizzy(higher is better) > ======================== > base pvflushv4 %improvement > 1VM 2414.5000 2089.8750 -13.44481 > 2VM 2167.6250 2371.7500 9.41699 > 3VM 1600.1111 2102.5556 31.40060 > > Thanks Raghu for running the tests. > > [1] http://article.gmane.org/gmane.linux.kernel/1329752 > > --- > > Nikunj A. Dadhania (6): > KVM Guest: Add VCPU running/pre-empted state for guest > KVM-HV: Add VCPU running/pre-empted state for guest > KVM Guest: Add paravirt kvm_flush_tlb_others > KVM-HV: Add flush_on_enter before guest enter > Enable HAVE_RCU_TABLE_FREE for kvm when PARAVIRT_TLB_FLUSH is enabled > KVM-doc: Add paravirt tlb flush document > > Peter Zijlstra (2): > mm, x86: Add HAVE_RCU_TABLE_FREE support > mm: Add missing TLB invalidate to RCU page-table freeing > > > Documentation/virtual/kvm/msr.txt | 4 + > Documentation/virtual/kvm/paravirt-tlb-flush.txt | 53 ++++++++++++++ > arch/Kconfig | 3 + > arch/powerpc/Kconfig | 1 > arch/sparc/Kconfig | 1 > arch/x86/Kconfig | 11 +++ > arch/x86/include/asm/kvm_host.h | 7 ++ > arch/x86/include/asm/kvm_para.h | 13 +++ > arch/x86/include/asm/tlb.h | 1 > arch/x86/include/asm/tlbflush.h | 11 +++ > arch/x86/kernel/kvm.c | 38 ++++++++++ > arch/x86/kvm/cpuid.c | 1 > arch/x86/kvm/x86.c | 84 +++++++++++++++++++++- > arch/x86/mm/pgtable.c | 6 +- > arch/x86/mm/tlb.c | 36 +++++++++ > include/asm-generic/tlb.h | 9 ++ > mm/memory.c | 43 ++++++++++- > 17 files changed, 311 insertions(+), 11 deletions(-) > create mode 100644 Documentation/virtual/kvm/paravirt-tlb-flush.txt Avi, PeterZ can you please review? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html