[PATCH v4 0/8] KVM paravirt remote flush tlb

"Nikunj A. Dadhania" <nikunj@xxxxxxxxxxxxxxxxxx> · Tue, 21 Aug 2012 16:55:52 +0530

Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This was discovered in our gang scheduling test and other way to solve
this is by para-virtualizing the flush_tlb_others_ipi(now shows up as
smp_call_function_many after Alex Shi's TLB optimization)

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter. Idea was discussed here:
https://lkml.org/lkml/2012/2/20/157

This also brings one more dependency for lock-less page walk that is
performed by get_user_pages_fast(gup_fast). gup_fast disables the
interrupt and assumes that the pages will not be freed during that
period. And this was fine as the flush_tlb_others_ipi would wait for
all the IPI to be processed and return back. With the new approach of
not waiting for the sleeping vcpus, this assumption is not valid
anymore. So now HAVE_RCU_TABLE_FREE is used to free the pages. This
will make sure that all the cpus would atleast process smp_callback
before the pages are freed.

Changelog from v3:
• Add helper for cleaning up vcpu_state information (Marcelo)
• Fix code for checking vs_page and leaking page refs (Marcelo)

Changelog from v2:
• Rebase to 3.5 based linus(commit - f7da9cd) kernel.
• Port PV-Flush to new TLB-Optimization code by Alex Shi
• Use pinned pages to avoid overhead during guest enter/exit (Marcelo)
• Remove kick, as this is not improving much
• Use bit fields in the state(flush_on_enter and vcpu_running) flag to
  avoid smp barriers (Marcelo)

Changelog from v1:
• Race fixes reported by Vatsa
• Address gup_fast dependency using PeterZ's rcu table free patch
• Fix rcu_table_free for hw pagetable walkers

Here are the results from PLE hardware. Here is the setup details:
• 32 CPUs (HT disabled)
• 64-bit VM
   • 32vcpus
   • 8GB RAM

base =  3.6-rc1 + ple handler optimization patch
pvflushv4 =  3.6-rc1 + ple handler optimization patch + pvflushv4 patch

kernbench(lower is better)
==========================
         base      pvflushv4      %improvement
1VM    48.5800       46.8513       3.55846
2VM   108.1823      104.6410       3.27346
3VM   183.2733      163.3547      10.86825

ebizzy(higher is better)
========================
         base         pvflushv4      %improvement
1VM     2414.5000     2089.8750     -13.44481
2VM     2167.6250     2371.7500      9.41699
3VM     1600.1111     2102.5556     31.40060

Thanks Raghu for running the tests.

[1] http://article.gmane.org/gmane.linux.kernel/1329752

---

Nikunj A. Dadhania (6):
      KVM Guest: Add VCPU running/pre-empted state for guest
      KVM-HV: Add VCPU running/pre-empted state for guest
      KVM Guest: Add paravirt kvm_flush_tlb_others
      KVM-HV: Add flush_on_enter before guest enter
      Enable HAVE_RCU_TABLE_FREE for kvm when PARAVIRT_TLB_FLUSH is enabled
      KVM-doc: Add paravirt tlb flush document

Peter Zijlstra (2):
      mm, x86: Add HAVE_RCU_TABLE_FREE support
      mm: Add missing TLB invalidate to RCU page-table freeing

 Documentation/virtual/kvm/msr.txt                |    4 +
 Documentation/virtual/kvm/paravirt-tlb-flush.txt |   53 ++++++++++++++
 arch/Kconfig                                     |    3 +
 arch/powerpc/Kconfig                             |    1 
 arch/sparc/Kconfig                               |    1 
 arch/x86/Kconfig                                 |   11 +++
 arch/x86/include/asm/kvm_host.h                  |    7 ++
 arch/x86/include/asm/kvm_para.h                  |   13 +++
 arch/x86/include/asm/tlb.h                       |    1 
 arch/x86/include/asm/tlbflush.h                  |   11 +++
 arch/x86/kernel/kvm.c                            |   38 ++++++++++
 arch/x86/kvm/cpuid.c                             |    1 
 arch/x86/kvm/x86.c                               |   84 +++++++++++++++++++++-
 arch/x86/mm/pgtable.c                            |    6 +-
 arch/x86/mm/tlb.c                                |   36 +++++++++
 include/asm-generic/tlb.h                        |    9 ++
 mm/memory.c                                      |   43 ++++++++++-
 17 files changed, 311 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/virtual/kvm/paravirt-tlb-flush.txt

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html