This patch series adds support for the TDP MMU in the fast_page_fault path, which enables certain write-protection and access tracking faults to be handled without taking the KVM MMU lock. This series brings the performance of these faults up to par with the legacy MMU. Design ------ This series enables the existing fast_page_fault handler to operate independent of whether the TDP MMU is enabled or not by abstracting out the details behind a new lockless page walk API. I tried an alterative design where the TDP MMU provided its own fast_page_fault handler and there was a shared helper code for modifying the PTE. However I decided against this approach because it forced me to duplicate the retry loop, resulted in calls back and forth between mmu.c and tdp_mmu.c, and passing around the RET_PF_* values got complicated fast. Testing ------- Setup: - Ran all tests on a Cascade Lake machine. - Ran all tests with kvm_intel.eptad=N, kvm_intel.pml=N, kvm.tdp_mmu=N. - Ran all tests with kvm_intel.eptad=N, kvm_intel.pml=N, kvm.tdp_mmu=Y. Tests: - Ran ll KVM selftests with default arguments - ./access_tracking_perf_test -v 4 - ./access_tracking_perf_test -v 4 -o - ./access_tracking_perf_test -v 4 -s anonymous_thp - ./access_tracking_perf_test -v 4 -s anonymous_thp -o - ./access_tracking_perf_test -v 64 - ./dirty_log_perf_test -v 4 -s anonymous_thp - ./dirty_log_perf_test -v 4 -s anonymous_thp -o - ./dirty_log_perf_test -v 4 -o - ./dirty_log_perf_test -v 64 For certain tests I also collected the fast_page_fault tracepoint to manually make sure it was getting triggered properly: perf record -e kvmmmu:fast_page_fault --filter "old_spte != 0" -- <test> Performance Results ------------------- To measure performance I ran dirty_log_perf_test and access_tracking_perf_test with 64 vCPUs. For dirty_log_perf_test performance is measured by "Iteration 2 dirty memory time", the time it takes for all vCPUs to write to their memory after it has been write-protected. For access_tracking_perf_test performance is measured by "Writing to idle memory", the time it takes for all vCPUs to write to their memory after it has been access-protected. Both metrics improved by 10x: Metric | tdp_mmu=Y before | tdp_mmu=Y after --------------------------------- | ------------------ | -------------------- Iteration 2 dirty memory time | 3.545234984s | 0.312197959s Writing to idle memory | 3.249645416s | 0.298275545s The TDP MMU is now on par with the legacy MMU: Metric | tdp_mmu=N | tdp_mmu=Y --------------------------------- | ------------------ | -------------------- Iteration 2 dirty memory time | 0.300802793s | 0.312197959s Writing to idle memory | 0.295591860s | 0.298275545s David Matlack (8): KVM: x86/mmu: Refactor is_tdp_mmu_root() KVM: x86/mmu: Rename cr2_or_gpa to gpa in fast_page_fault KVM: x86/mmu: Fix use of enums in trace_fast_page_fault KVM: x86/mmu: Common API for lockless shadow page walks KVM: x86/mmu: Also record spteps in shadow_page_walk KVM: x86/mmu: fast_page_fault support for the TDP MMU KVM: selftests: Fix missing break in dirty_log_perf_test arg parsing KVM: selftests: Introduce access_tracking_perf_test arch/x86/kvm/mmu/mmu.c | 159 +++---- arch/x86/kvm/mmu/mmu_internal.h | 18 + arch/x86/kvm/mmu/mmutrace.h | 3 + arch/x86/kvm/mmu/tdp_mmu.c | 37 +- arch/x86/kvm/mmu/tdp_mmu.h | 14 +- tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 3 + .../selftests/kvm/access_tracking_perf_test.c | 419 ++++++++++++++++++ .../selftests/kvm/dirty_log_perf_test.c | 1 + 9 files changed, 559 insertions(+), 96 deletions(-) create mode 100644 tools/testing/selftests/kvm/access_tracking_perf_test.c -- 2.32.0.272.g935e593368-goog