On Sat, Dec 11, 2021, Paolo Bonzini wrote: > On 12/11/21 03:39, Sean Christopherson wrote: > > That means that KVM (a) is somehow losing track of a root, (b) isn't zapping all > > SPTEs in kvm_mmu_zap_all(), or (c) is installing a SPTE after the mm has been released. > > > > (a) is unlikely because kvm_tdp_mmu_get_vcpu_root_hpa() is the only way for a > > vCPU to get a reference, and it holds mmu_lock for write, doesn't yield, and > > either gets a root from the list or adds a root to the list. > > > > (b) is unlikely because I would expect the fallout to be much larger and not > > unique to your setup. > > Hmm, I think it's kvm_mmu_zap_all() skipping invalidated roots. That should be impossible. kvm_mmu_zap_all_fast() invalidates those roots before it completes, and all paths that lead to kvm_mmu_zap_all_fast() prevent kvm_destroy_vm() from getting to mmu_notifier_unregister(). kvm_mmu_invalidate_mmio_sptes() and kvm_mmu_invalidate_zap_pages_in_memslot() are reachable only via memslot update, which requires a reference to KVM and thus prevents putting the last reference to to KVM. set_nx_huge_pages() runs with kvm_lock held, which prevent kvm_destroy_vm() from proceeding to mmu_notifier_unregister(). If your patch does make the problem go away, we have a bug somewhere else. One other experiment that's probably worth trying at this point is running with my zap and flush overhaul[*], which is based on commit 81d7c6659da0 ("KVM: VMX: Remove vCPU from PI wakeup list before updating PID.NV"). I highly doubt it will fix the issue, but I'm out of other ideas until one of us can reproduce the bug. https://lore.kernel.org/all/20211120045046.3940942-1-seanjc@xxxxxxxxxx/