Re: [PATCH 0/8] KVM: x86/mmu: Fast page fault support for the TDP MMU

David Matlack <dmatlack@xxxxxxxxxx> · Mon, 14 Jun 2021 21:08:18 +0000

On Mon, Jun 14, 2021 at 11:54:59AM +0200, Paolo Bonzini wrote:
> On 12/06/21 01:56, David Matlack wrote:
> > This patch series adds support for the TDP MMU in the fast_page_fault
> > path, which enables certain write-protection and access tracking faults
> > to be handled without taking the KVM MMU lock. This series brings the
> > performance of these faults up to par with the legacy MMU.
> 
> Hi David,
> 
> I have one very basic question: is the speedup due to lock contention, or to
> cacheline bouncing, or something else altogether? In other words, what do
> the profiles look like before vs. after these patches?

The speed up comes from a combination of:
 - Less time spent in kvm_vcpu_gfn_to_memslot.
 - Less lock contention on the MMU lock in read mode.

Before:

  Overhead  Symbol
-   45.59%  [k] kvm_vcpu_gfn_to_memslot
   - 45.57% kvm_vcpu_gfn_to_memslot
      - 29.25% kvm_page_track_is_active
         + 15.90% direct_page_fault
         + 13.35% mmu_need_write_protect
      + 9.10% kvm_mmu_hugepage_adjust
      + 7.20% try_async_pf
+   18.16%  [k] _raw_read_lock
+   10.57%  [k] direct_page_fault
+    8.77%  [k] handle_changed_spte_dirty_log
+    4.65%  [k] mark_page_dirty_in_slot
     1.62%  [.] run_test
+    1.35%  [k] x86_virt_spec_ctrl
+    1.18%  [k] try_grab_compound_head
[...]

After:

  Overhead  Symbol
+   26.23%  [k] x86_virt_spec_ctrl
+   15.93%  [k] vmx_vmexit
+    6.33%  [k] vmx_vcpu_run
+    4.31%  [k] vcpu_enter_guest
+    3.71%  [k] tdp_iter_next
+    3.47%  [k] __vmx_vcpu_run
+    2.92%  [k] kvm_vcpu_gfn_to_memslot
+    2.71%  [k] vcpu_run
+    2.71%  [k] fast_page_fault
+    2.51%  [k] kvm_vcpu_mark_page_dirty

(Both profiles were captured during "Iteration 2 dirty memory" of
dirty_log_perf_test.)

Related to the kvm_vcpu_gfn_to_memslot overhead: I actually have a set of
patches from Ben I am planning to send soon that will reduce the number of
redundant gfn-to-memslot lookups in the page fault path.

> 
> Thanks,
> 
> Paolo
>