The main intent of this series is to allow yielding, i.e. cond_resched(), when unmapping memory in shadow MMUs in response to an mmu_notifier invalidation. There is zero reason not to yield, and in fact I _thought_ KVM did yield, but because of how KVM grew over the years, the unmap path got left behind. The first half of the series is reworks max_guest_memory_test into mmu_stress_test, to give some confidence in the mmu_notifier-related changes. Oliver and Marc, there's on patch lurking in here to enable said test on arm64. It's as well tested as I can make it (and that took much longer than anticipated because arm64 hit races in the test that x86 doesn't for whatever reason). The middle of the series reworks x86's shadow MMU logic to use the zap flow that can yield. The last third or so is a wee bit adventurous, and is kinda of an RFC, but well tested. It's essentially prep/post work for James' MGLRU, and allows aging SPTEs in x86's shadow MMU to run outside of mmu_lock, e.g. so that nested TDP (stage-2) MMUs can participate in MGLRU. If everything checks out, my goal is to land the selftests and yielding changes in 6.12. The aging stuff is incomplete and meaningless without James' MGLRU, I'm posting it here purely so that folks can see the end state when the mmu_notifier invalidation paths also moves to a different API. James, the aging stuff is quite well tested (see below). Can you try working into/on-top of your MGLRU series? And if you're feeling very kind, hammer it a bit more? :-) I haven't looked at the latest ideas and/or discussion on the MGLRU series, but I'm hoping that being able to support the shadow MMU (absent the stupid eptad=0 case) in MGLRU will allow for few shenanigans, e.g. no need to toggle flags during runtime. As for testing, I spun up a VM and ran a compilation loop and `stress` in the VM, while simultaneously running a small userspace program to age the VM's memory (also in an infinite loop), using the same basic methodology as access_tracking_perf_test.c (I put almost all of guest memory into a memfd and then aged only that range of memory). I confirmed that the locking does work, e.g. that there was (infrequent) contention, and am fairly confident that the idea pans out. E.g. I hit the BUG_ON(!is_shadow_present_pte()) using that setup, which is the only reason those patches exist :-) Sean Christopherson (22): KVM: selftests: Check for a potential unhandled exception iff KVM_RUN succeeded KVM: selftests: Rename max_guest_memory_test to mmu_stress_test KVM: selftests: Only muck with SREGS on x86 in mmu_stress_test KVM: selftests: Compute number of extra pages needed in mmu_stress_test KVM: selftests: Enable mmu_stress_test on arm64 KVM: selftests: Use vcpu_arch_put_guest() in mmu_stress_test KVM: selftests: Precisely limit the number of guest loops in mmu_stress_test KVM: selftests: Add a read-only mprotect() phase to mmu_stress_test KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ) KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o mmu_lock KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock KVM: x86/mmu: Add support for lockless walks of rmap SPTEs KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging gfns ***HACK*** KVM: x86: Don't take mmu_lock when aging gfns arch/x86/kvm/mmu/mmu.c | 527 +++++++++++------- arch/x86/kvm/svm/svm.c | 2 + arch/x86/kvm/vmx/vmx.c | 2 + tools/testing/selftests/kvm/Makefile | 3 +- tools/testing/selftests/kvm/lib/kvm_util.c | 3 +- ..._guest_memory_test.c => mmu_stress_test.c} | 144 ++++- virt/kvm/kvm_main.c | 7 +- 7 files changed, 482 insertions(+), 206 deletions(-) rename tools/testing/selftests/kvm/{max_guest_memory_test.c => mmu_stress_test.c} (65%) base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f -- 2.46.0.76.ge559c4bf1a-goog