Re: Potential bug in TDP MMU

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 13 Dec 2021 16:14:47 +0000

On Sat, Dec 11, 2021, Paolo Bonzini wrote:
> On 12/11/21 03:39, Sean Christopherson wrote:
> > That means that KVM (a) is somehow losing track of a root, (b) isn't zapping all
> > SPTEs in kvm_mmu_zap_all(), or (c) is installing a SPTE after the mm has been released.
> > 
> > (a) is unlikely because kvm_tdp_mmu_get_vcpu_root_hpa() is the only way for a
> > vCPU to get a reference, and it holds mmu_lock for write, doesn't yield, and
> > either gets a root from the list or adds a root to the list.
> > 
> > (b) is unlikely because I would expect the fallout to be much larger and not
> > unique to your setup.
> 
> Hmm, I think it's kvm_mmu_zap_all() skipping invalidated roots.

That should be impossible.  kvm_mmu_zap_all_fast() invalidates those roots before
it completes, and all paths that lead to kvm_mmu_zap_all_fast() prevent
kvm_destroy_vm() from getting to mmu_notifier_unregister().

kvm_mmu_invalidate_mmio_sptes() and kvm_mmu_invalidate_zap_pages_in_memslot()
are reachable only via memslot update, which requires a reference to KVM and thus
prevents putting the last reference to to KVM.

set_nx_huge_pages() runs with kvm_lock held, which prevent kvm_destroy_vm() from
proceeding to mmu_notifier_unregister().

If your patch does make the problem go away, we have a bug somewhere else.

One other experiment that's probably worth trying at this point is running with
my zap and flush overhaul[*], which is based on commit 81d7c6659da0 ("KVM: VMX:
Remove vCPU from PI wakeup list before updating PID.NV").  I highly doubt it will
fix the issue, but I'm out of other ideas until one of us can reproduce the bug.

https://lore.kernel.org/all/20211120045046.3940942-1-seanjc@xxxxxxxxxx/