On Fri, Apr 2, 2021 at 12:53 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > On 02/04/21 01:37, Ben Gardon wrote: > > +void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root, > > + bool shared) > > { > > gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT); > > > > - lockdep_assert_held_write(&kvm->mmu_lock); > > + kvm_lockdep_assert_mmu_lock_held(kvm, shared); > > > > if (!refcount_dec_and_test(&root->tdp_mmu_root_count)) > > return; > > @@ -81,7 +92,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root) > > list_del_rcu(&root->link); > > spin_unlock(&kvm->arch.tdp_mmu_pages_lock); > > > > - zap_gfn_range(kvm, root, 0, max_gfn, false, false); > > + zap_gfn_range(kvm, root, 0, max_gfn, false, false, shared); > > > > call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback); > > Instead of patch 13, would it make sense to delay the zap_gfn_range and > call_rcu to a work item (either unconditionally, or only if > shared==false)? Then the zap_gfn_range would be able to yield and take > the mmu_lock for read, similar to kvm_tdp_mmu_zap_invalidated_roots. > > If done unconditionally, this would also allow removing the "shared" > argument to kvm_tdp_mmu_put_root, tdp_mmu_next_root and > for_each_tdp_mmu_root_yield_safe, so I would place that change before > this patch. > > Paolo > I tried that and it created problems. I believe the issue was that on VM teardown memslots would be freed and the memory reallocated before the root was torn down, resulting in a use-after free from mark_pfn_dirty. Perhaps this could be resolved by forcing memslot changes to wait until that work item was processed before returning. I can look into it but I suspect there will be a lot of "gotchas" involved.