On Fri, Dec 10, 2021 at 5:49 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > On 12/11/21 02:34, David Matlack wrote: > > The stacks help, thanks for including them. It seems like a race > > during do_exit teardown. One thing I notice is that > > do_exit->mmput->kvm_mmu_zap_all can interleave with > > kvm_vcpu_release->kvm_tdp_mmu_put_root (full call chains omitted), > > since the former path allows yielding. But I don't yet see that could > > lead to any issues, let alone cause us to encounter a PFN in the EPT > > with a zero refcount. > > Can it? The call chains are > > zap_gfn_range+2229 > kvm_tdp_mmu_put_root+465 > kvm_mmu_free_roots+629 > kvm_mmu_unload+28 > kvm_arch_destroy_vm+510 > kvm_put_kvm+1017 > kvm_vcpu_release+78 > __fput+516 > task_work_run+206 > do_exit+2615 > do_group_exit+236 > > and > > zap_gfn_range+2229 > __kvm_tdp_mmu_zap_gfn_range+162 > kvm_tdp_mmu_zap_all+34 > kvm_mmu_zap_all+518 > kvm_mmu_notifier_release+83 > __mmu_notifier_release+420 > exit_mmap+965 > mmput+167 > do_exit+2482 > do_group_exit+236 > > but there can be no parallelism or interleaving here, because the call > to kvm_vcpu_release() is scheduled in exit_files() (and performed in > exit_task_work()). That comes after exit_mm(), where mmput() is called. Ah I was thinking each thread in the process would be run do_exit() concurrently (first thread enters mmput() but the refcount is not at zero and proceeds to task_work_run, second enters mmput() and the refcount is at zero and invokes notifier->release()). > > Even if the two could interleave, they go through the same zap_gfn_range > path. That path takes the lock for write and only yields on the 512 > top-level page structures. Anything below is handled by > tdp_mmu_set_spte's (with mutual recursion between handle_changed_spte > and handle_removed_tdp_mmu_page), and there are no yields on that path. > > Paolo