Re: [PATCH v4 21/30] KVM: x86/mmu: Zap invalidated roots via asynchronous worker

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 3 Mar 2022 21:06:14 +0000

On Thu, Mar 03, 2022, Sean Christopherson wrote:
> On Thu, Mar 03, 2022, Paolo Bonzini wrote:
> > +	root->tdp_mmu_async_data = kvm;
> > +	INIT_WORK(&root->tdp_mmu_async_work, tdp_mmu_zap_root_work);
> > +	queue_work(kvm->arch.tdp_mmu_zap_wq, &root->tdp_mmu_async_work);
> > +}
> > +
> > +static inline bool kvm_tdp_root_mark_invalid(struct kvm_mmu_page *page)
> > +{
> > +	union kvm_mmu_page_role role = page->role;
> > +	role.invalid = true;
> > +
> > +	/* No need to use cmpxchg, only the invalid bit can change.  */
> > +	role.word = xchg(&page->role.word, role.word);
> > +	return role.invalid;
> 
> This helper is unused.  It _could_ be used here, but I think it belongs in the
> next patch.  Critically, until zapping defunct roots creates the invariant that
> invalid roots are _always_ zapped via worker, kvm_tdp_mmu_invalidate_all_roots()
> must not assume that an invalid root is queued for zapping.  I.e. doing this
> before the "Zap defunct roots" would be wrong:
> 
> 	list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
> 		if (kvm_tdp_root_mark_invalid(root))
> 			continue;
> 
> 		if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root)));
> 			continue;
> 
> 		tdp_mmu_schedule_zap_root(kvm, root);
> 	}

Gah, lost my train of thought and forgot that this _can_ re-queue a root even in
this patch, it just can't it just can't re-queue a root that is _currently_ queued.

The re-queue scenario happens if a root is queued and zapped, but is kept alive
by a vCPU that hasn't yet put its reference.  If another memslot comes along before
the (sleeping) vCPU drops its reference, this will re-queue the root.

It's not a major problem in this patch as it's a small amount of wasted effort,
but it will be an issue when the "put" path starts using the queue, as that will
create a scenario where a memslot update (or NX toggle) can come along while a
defunct root is in the zap queue.

Checking for role.invalid is wrong (as above), so for this patch I think the
easiest thing is to use tdp_mmu_async_data as a sentinel that the root was zapped
in the past and doesn't need to be re-zapped.

/*
 * Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that
 * is about to be zapped, e.g. in response to a memslots update.  The actual
 * zapping is performed asynchronously, so a reference is taken on all roots.
 * Using a separate workqueue makes it easy to ensure that the destruction is
 * performed before the "fast zap" completes, without keeping a separate list
 * of invalidated roots; the list is effectively the list of work items in
 * the workqueue.
 *
 * Skip roots that were already queued for zapping, the "fast zap" path is the
 * only user of the zap queue and always flushes the queue under slots_lock,
 * i.e. the queued zap is guaranteed to have completed already.
 *
 * Because mmu_lock is held for write, it should be impossible to observe a
 * root with zero refcount,* i.e. the list of roots cannot be stale.
 *
 * This has essentially the same effect for the TDP MMU
 * as updating mmu_valid_gen does for the shadow MMU.
 */
void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
{
	struct kvm_mmu_page *root;

	lockdep_assert_held_write(&kvm->mmu_lock);
	list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
		if (root->tdp_mmu_async_data)
			continue;

		if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root)))
			continue;

		root->role.invalid = true;
		tdp_mmu_schedule_zap_root(kvm, root);
	}
}