On Wed, Dec 15, 2021, Sean Christopherson wrote: > On Wed, Dec 15, 2021, Paolo Bonzini wrote: > > On 12/15/21 02:15, Sean Christopherson wrote: > > > Patches 01-03 implement a bug fix by ensuring KVM zaps both valid and > > > invalid roots when unmapping a gfn range (including the magic "all" range). > > > Failure to zap invalid roots means KVM doesn't honor the mmu_notifier's > > > requirement that all references are dropped. > > > > > > set_nx_huge_pages() is the most blatant offender, as it doesn't elevate > > > mm_users and so a VM's entire mm can be released, but the same underlying > > > bug exists for any "unmap" command from the mmu_notifier in combination > > > with a memslot update. E.g. if KVM is deleting a memslot, and a > > > mmu_notifier hook acquires mmu_lock while it's dropped by > > > kvm_mmu_zap_all_fast(), the mmu_notifier hook will see the to-be-deleted > > > memslot but won't zap entries from the invalid roots. > > > > > > Patch 04 is cleanup to reuse the common iterator for walking _only_ > > > invalid roots. > > > > > > Sean Christopherson (4): > > > KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap > > > hook > > > KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root() > > > KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU > > > KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots > > > > > > arch/x86/kvm/mmu/tdp_mmu.c | 116 +++++++++++++++++-------------------- > > > arch/x86/kvm/mmu/tdp_mmu.h | 3 - > > > 2 files changed, 53 insertions(+), 66 deletions(-) > > > > > > > Queued 1-3 for 5.16 and 4 for 5.17. > > Actually, can you please unqueue patch 4? I think we can actually kill off > kvm_tdp_mmu_zap_invalidated_roots() entirely. I don't know if that code will be > ready for 5.17, but if it is then this patch is unnecesary. And if not, it > shouldn't be difficult to re-queue this a bit later. Cancel that request, the comment above kvm_tdp_mmu_zap_invalidated_roots() lies, as do the changelogs for commits b7cccd397f31 ("KVM: x86/mmu: Fast invalidation for TDP MMU") and 4c6654bd160d ("KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returns"), and the fact that they are even separate commits. KVM _must_ zap invalid roots before returning from kvm_mmu_zap_all_fast(), because when it's called from kvm_mmu_invalidate_zap_pages_in_memslot(), KVM is relying on it to fully remove all references to the memslot. Once the memslot is gone, KVM's mmu_notifier hooks will be unable to find the stale references as the hva=>gfn translation is done via the memslots. If userspace unmaps a range after deleting a memslot, KVM will fail to zap in response to the mmu_notifier due to not finding a memslot corresponding to the notifier's range, which leads to another variation of the splat I've come to know and love. WARNING: CPU: 33 PID: 44927 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:173 RIP: 0010:kvm_is_zone_device_pfn+0x96/0xa0 [kvm] kvm_set_pfn_dirty+0xa8/0xe0 [kvm] __handle_changed_spte+0x2f7/0x5b0 [kvm] __handle_changed_spte+0x2f7/0x5b0 [kvm] __tdp_mmu_set_spte+0x64/0x170 [kvm] tdp_mmu_zap_root+0x1f5/0x220 [kvm] kvm_tdp_mmu_zap_all+0x47/0x60 [kvm] kvm_mmu_zap_all+0xf0/0x100 [kvm] kvm_mmu_notifier_release+0x2b/0x60 [kvm] mmu_notifier_unregister+0x48/0xe0 kvm_destroy_vm+0x129/0x2a0 [kvm] kvm_vm_release+0x1d/0x30 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 exit_to_user_mode_prepare+0x114/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I'll include a patch in my flush+zap rework series to reword that comment, because it is very, very misleading.