On Mon, Jun 14, 2021 at 11:54:59AM +0200, Paolo Bonzini wrote: > On 12/06/21 01:56, David Matlack wrote: > > This patch series adds support for the TDP MMU in the fast_page_fault > > path, which enables certain write-protection and access tracking faults > > to be handled without taking the KVM MMU lock. This series brings the > > performance of these faults up to par with the legacy MMU. > > Hi David, > > I have one very basic question: is the speedup due to lock contention, or to > cacheline bouncing, or something else altogether? In other words, what do > the profiles look like before vs. after these patches? The speed up comes from a combination of: - Less time spent in kvm_vcpu_gfn_to_memslot. - Less lock contention on the MMU lock in read mode. Before: Overhead Symbol - 45.59% [k] kvm_vcpu_gfn_to_memslot - 45.57% kvm_vcpu_gfn_to_memslot - 29.25% kvm_page_track_is_active + 15.90% direct_page_fault + 13.35% mmu_need_write_protect + 9.10% kvm_mmu_hugepage_adjust + 7.20% try_async_pf + 18.16% [k] _raw_read_lock + 10.57% [k] direct_page_fault + 8.77% [k] handle_changed_spte_dirty_log + 4.65% [k] mark_page_dirty_in_slot 1.62% [.] run_test + 1.35% [k] x86_virt_spec_ctrl + 1.18% [k] try_grab_compound_head [...] After: Overhead Symbol + 26.23% [k] x86_virt_spec_ctrl + 15.93% [k] vmx_vmexit + 6.33% [k] vmx_vcpu_run + 4.31% [k] vcpu_enter_guest + 3.71% [k] tdp_iter_next + 3.47% [k] __vmx_vcpu_run + 2.92% [k] kvm_vcpu_gfn_to_memslot + 2.71% [k] vcpu_run + 2.71% [k] fast_page_fault + 2.51% [k] kvm_vcpu_mark_page_dirty (Both profiles were captured during "Iteration 2 dirty memory" of dirty_log_perf_test.) Related to the kvm_vcpu_gfn_to_memslot overhead: I actually have a set of patches from Ben I am planning to send soon that will reduce the number of redundant gfn-to-memslot lookups in the page fault path. > > Thanks, > > Paolo >