On Thu, Mar 03, 2022, Paolo Bonzini wrote: > Zap defunct roots, a.k.a. roots that have been invalidated after their > last reference was initially dropped, asynchronously via the system work > queue instead of forcing the work upon the unfortunate task that happened > to drop the last reference. > > If a vCPU task drops the last reference, the vCPU is effectively blocked > by the host for the entire duration of the zap. If the root being zapped > happens be fully populated with 4kb leaf SPTEs, e.g. due to dirty logging > being active, the zap can take several hundred seconds. Unsurprisingly, > most guests are unhappy if a vCPU disappears for hundreds of seconds. > > E.g. running a synthetic selftest that triggers a vCPU root zap with > ~64tb of guest memory and 4kb SPTEs blocks the vCPU for 900+ seconds. > Offloading the zap to a worker drops the block time to <100ms. > > Co-developed-by: Sean Christopherson <seanjc@xxxxxxxxxx> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > Reviewed-by: Ben Gardon <bgardon@xxxxxxxxxx> > Message-Id: <20220226001546.360188-23-seanjc@xxxxxxxxxx> > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > --- > arch/x86/kvm/mmu/tdp_mmu.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index e24a1bff9218..2456f880508d 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -170,13 +170,24 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root, > */ > if (!kvm_tdp_root_mark_invalid(root)) { > refcount_set(&root->tdp_mmu_root_count, 1); > - tdp_mmu_zap_root(kvm, root, shared); > > /* > - * Give back the reference that was added back above. We now > + * If the struct kvm is alive, we might as well zap the root > + * in a worker. The worker takes ownership of the reference we > + * just added to root and is flushed before the struct kvm dies. Not a fan of the "we might as well zap the root in a worker", IMO we should require going forward that invalidated, reachable TDP MMU roots are always zapped in a worker > + */ > + if (likely(refcount_read(&kvm->users_count))) { > + tdp_mmu_schedule_zap_root(kvm, root); Regarding the need for kvm_tdp_mmu_invalidate_all_roots() to guard against re-queueing a root for zapping, this is the point where it becomes functionally problematic. When "fast zap" was the only user of tdp_mmu_schedule_zap_root(), re-queueing was benign as the work struct was guaranteed to not be currently queued. But this code runs outside of slots_lock, and so a root that was "put" but hasn't finished zapping can be observed and re-queued by the "fast zap. I think it makes sense to create a rule/invariant that an invalidated TDP MMU root _must_ be zapped via the work queue. Then I.e. do this as fixup: diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 40bf861b622a..cff4f2102a63 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1019,8 +1019,9 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) * of invalidated roots; the list is effectively the list of work items in * the workqueue. * - * Skip roots that are already queued for zapping, flushing the work queue will - * ensure invalidated roots are zapped regardless of when they were queued. + * Skip roots that are already invalid and thus queued for zapping, flushing + * the work queue will ensure invalid roots are zapped regardless of when they + * were queued. * * Because mmu_lock is held for write, it should be impossible to observe a * root with zero refcount,* i.e. the list of roots cannot be stale. @@ -1034,13 +1035,12 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) lockdep_assert_held_write(&kvm->mmu_lock); list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { - if (root->tdp_mmu_async_data) + if (kvm_tdp_root_mark_invalid(root)) continue; if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) continue; - root->role.invalid = true; tdp_mmu_schedule_zap_root(kvm, root); } } > + return; > + } > + > + /* > + * The struct kvm is being destroyed, zap synchronously and give > + * back immediately the reference that was added above. We now > * know that the root is invalid, so go ahead and free it if > * no one has taken a reference in the meanwhile. > */ > + tdp_mmu_zap_root(kvm, root, shared); > if (!refcount_dec_and_test(&root->tdp_mmu_root_count)) > return; > } > -- > 2.31.1 > >