This patch series adds support for the TDP MMU in the fast_page_fault path, which enables certain write-protection and access tracking faults to be handled without taking the KVM MMU lock. This series brings the performance of these faults up to par with the legacy MMU. Since there is not currently any KVM test coverage for access tracking faults, this series introduces a new KVM selftest, access_tracking_perf_test. This test relies on page_idle to enable access tracking from userspace (since it is the only available usersapce API to do so). Matthew Wilcox, Yu Zhao, David Hildenbrand, and Andrew Morton: You are cc'd since you have discussed dropping page_idle from Linux [1]. Design ------ This series enables the existing fast_page_fault handler to operate independent of whether the TDP MMU is enabled or not by abstracting out the details behind a new lockless page walk API. An alternative design considered was to add a separate fast_page_fault handler to the TDP MMU. The code that inspects the spte and genereates the new spte can be shared with the legacy MMU. However with this design the retry loop has to be duplicated, there are many calls back and forth between mmu.c and tdp_mmu.c, and passing around the RET_PF_* values gets complicated. Testing ------- This series was tested on an Intel Cascade Lake machine. The kvm_intel parameters eptad and pml were disabled to force access and dirty tracking to go through fast_page_fault. All tests were run with the TDP MMU enabled and then again disabled. Tests ran: - All KVM selftests with default arguments - All x86_64 kvm-unit-tests. - ./access_tracking_perf_test -v 4 - ./access_tracking_perf_test -v 4 -o - ./access_tracking_perf_test -v 4 -s anonymous_thp - ./access_tracking_perf_test -v 4 -s anonymous_thp -o - ./access_tracking_perf_test -v 64 - ./dirty_log_perf_test -v 4 - ./dirty_log_perf_test -v 4 -o - ./dirty_log_perf_test -v 4 -s anonymous_thp - ./dirty_log_perf_test -v 4 -s anonymous_thp -o - ./dirty_log_perf_test -v 64 For certain tests I also collected the fast_page_fault tracepoint to manually make sure it was getting triggered properly: perf record -e kvmmmu:fast_page_fault --filter "old_spte != 0" -- <test> Performance Results ------------------- To measure performance I ran dirty_log_perf_test and access_tracking_perf_test with 64 vCPUs. For dirty_log_perf_test performance is measured by "Iteration 2 dirty memory time", the time it takes for all vCPUs to write to their memory after it has been write-protected. For access_tracking_perf_test performance is measured by "Writing to idle memory", the time it takes for all vCPUs to write to their memory after it has been access-protected. Metric | tdp_mmu=Y before | tdp_mmu=Y after --------------------------------- | ------------------ | ----------------- Iteration 2 dirty memory time | 3.545234984s | 0.313867232s Writing to idle memory | 3.249645416s | 0.296113187s The performance improvement comes from less time spent acquiring the mmu lock in read mode and less time looking up the memslot for the faulting gpa. The TDP MMU is now on par with the legacy MMU: Metric | tdp_mmu=N | tdp_mmu=Y --------------------------------- | ------------------ | ----------------- Iteration 2 dirty memory time | 0.303452990s | 0.313867232s Writing to idle memory | 0.291742127s | 0.296113187s v3: * PATCH 1/6: Add Sean's Reviewed-by. * PATCH 2/6: Add TRACE_DEFINE_ENUM for all RET_PF_* values. [Ben] * PATCH 2/6: Add comment for future RET_PF values. [me] * PATCH 3/6: Pull walk_shadow_page_lockless_{begin,end} out of get_walk. [Ben] * PATCH 3/6: Make kvm_tdp_mmu_walk_lockless_{begin,end} static inline. [Sean] * PATCH 4/6: Make get_last_sptep_lockless static. [kernel test robot] * PATCH 4/6: Fix comment above kvm_tdp_mmu_get_last_sptep_lockless. [me] * PATCH 4/6: Rename and comment functions only meant for fast_page_fault handling. [Ben] * PATCH 4/6: Improve comment in tdp_mmu_set_spte_atomic_no_dirty_log. [Sean] * PATCH 4/6: Remove unnecessary sptep null check. [Sean] v2: https://lore.kernel.org/kvm/20210630214802.1902448-1-dmatlack@xxxxxxxxxx/ * Split is_tdp_mmu_root cleanup into a separate series. [Sean] https://lore.kernel.org/kvm/20210617231948.2591431-1-dmatlack@xxxxxxxxxx/ * Split walk_shadow_page_lockless into 2 APIs. [Sean] * Perform rcu_dereference on TDP MMU sptep. * Add comment to tdp_mmu_set_spte_atomic explaining new interaction * with fast_pf_fix_direct_spte. [Ben] * Document pagemap shifts in access_tracking_perf_test. [Ben] * Skip test if lacking pagemap permissions (present pfn is 0). [Ben] * Add Ben's Reviewed-by tags. v1: https://lore.kernel.org/kvm/20210611235701.3941724-1-dmatlack@xxxxxxxxxx/ [1] https://lore.kernel.org/linux-mm/20210612000714.775825-1-willy@xxxxxxxxxxxxx/ David Matlack (6): KVM: x86/mmu: Rename cr2_or_gpa to gpa in fast_page_fault KVM: x86/mmu: Fix use of enums in trace_fast_page_fault KVM: x86/mmu: Make walk_shadow_page_lockless_{begin,end} interoperate with the TDP MMU KVM: x86/mmu: fast_page_fault support for the TDP MMU KVM: selftests: Fix missing break in dirty_log_perf_test arg parsing KVM: selftests: Introduce access_tracking_perf_test arch/x86/kvm/mmu/mmu.c | 74 ++- arch/x86/kvm/mmu/mmu_internal.h | 3 + arch/x86/kvm/mmu/mmutrace.h | 6 + arch/x86/kvm/mmu/tdp_mmu.c | 47 +- arch/x86/kvm/mmu/tdp_mmu.h | 12 + tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/access_tracking_perf_test.c | 429 ++++++++++++++++++ .../selftests/kvm/dirty_log_perf_test.c | 1 + 9 files changed, 550 insertions(+), 24 deletions(-) create mode 100644 tools/testing/selftests/kvm/access_tracking_perf_test.c -- 2.32.0.93.g670b81a890-goog