On Thu, Mar 03, 2022, Sean Christopherson wrote: > On Thu, Mar 03, 2022, Paolo Bonzini wrote: > > + root->tdp_mmu_async_data = kvm; > > + INIT_WORK(&root->tdp_mmu_async_work, tdp_mmu_zap_root_work); > > + queue_work(kvm->arch.tdp_mmu_zap_wq, &root->tdp_mmu_async_work); > > +} > > + > > +static inline bool kvm_tdp_root_mark_invalid(struct kvm_mmu_page *page) > > +{ > > + union kvm_mmu_page_role role = page->role; > > + role.invalid = true; > > + > > + /* No need to use cmpxchg, only the invalid bit can change. */ > > + role.word = xchg(&page->role.word, role.word); > > + return role.invalid; > > This helper is unused. It _could_ be used here, but I think it belongs in the > next patch. Critically, until zapping defunct roots creates the invariant that > invalid roots are _always_ zapped via worker, kvm_tdp_mmu_invalidate_all_roots() > must not assume that an invalid root is queued for zapping. I.e. doing this > before the "Zap defunct roots" would be wrong: > > list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { > if (kvm_tdp_root_mark_invalid(root)) > continue; > > if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))); > continue; > > tdp_mmu_schedule_zap_root(kvm, root); > } Gah, lost my train of thought and forgot that this _can_ re-queue a root even in this patch, it just can't it just can't re-queue a root that is _currently_ queued. The re-queue scenario happens if a root is queued and zapped, but is kept alive by a vCPU that hasn't yet put its reference. If another memslot comes along before the (sleeping) vCPU drops its reference, this will re-queue the root. It's not a major problem in this patch as it's a small amount of wasted effort, but it will be an issue when the "put" path starts using the queue, as that will create a scenario where a memslot update (or NX toggle) can come along while a defunct root is in the zap queue. Checking for role.invalid is wrong (as above), so for this patch I think the easiest thing is to use tdp_mmu_async_data as a sentinel that the root was zapped in the past and doesn't need to be re-zapped. /* * Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that * is about to be zapped, e.g. in response to a memslots update. The actual * zapping is performed asynchronously, so a reference is taken on all roots. * Using a separate workqueue makes it easy to ensure that the destruction is * performed before the "fast zap" completes, without keeping a separate list * of invalidated roots; the list is effectively the list of work items in * the workqueue. * * Skip roots that were already queued for zapping, the "fast zap" path is the * only user of the zap queue and always flushes the queue under slots_lock, * i.e. the queued zap is guaranteed to have completed already. * * Because mmu_lock is held for write, it should be impossible to observe a * root with zero refcount,* i.e. the list of roots cannot be stale. * * This has essentially the same effect for the TDP MMU * as updating mmu_valid_gen does for the shadow MMU. */ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) { struct kvm_mmu_page *root; lockdep_assert_held_write(&kvm->mmu_lock); list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { if (root->tdp_mmu_async_data) continue; if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) continue; root->role.invalid = true; tdp_mmu_schedule_zap_root(kvm, root); } }