On Fri, Apr 21, 2023 at 6:56 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Fri, Apr 21, 2023, David Matlack wrote: > > On Fri, Apr 21, 2023 at 2:49 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) > > > { > > > struct kvm_mmu_page *root; > > > > > > - lockdep_assert_held_write(&kvm->mmu_lock); > > > - list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { > > > - if (!root->role.invalid && > > > - !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) { > > > + /* > > > + * Note! mmu_lock isn't held when destroying the VM! There can't be > > > + * other references to @kvm, i.e. nothing else can invalidate roots, > > > + * but walking the list of roots does need to be guarded against roots > > > + * being deleted by the asynchronous zap worker. > > > + */ > > > + rcu_read_lock(); > > > + > > > + list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) { > > > > I see that roots are removed from the list with list_del_rcu(), so I > > agree this should be safe. > > > > KVM could, alternatively, acquire the mmu_lock in > > kvm_mmu_uninit_tdp_mmu(), which would let us keep the lockdep > > assertion and drop the rcu_read_lock() + comment. That might be worth > > it in case someone accidentally adds a call to > > kvm_tdp_mmu_invalidate_all_roots() without mmu_lock outside of VM > > teardown. kvm_mmu_uninit_tdp_mmu() is not a particularly performance > > sensitive path and adding the mmu_lock wouldn't add much overhead > > anyway (it would block for at most a few milliseconds waiting for the > > async work to reschedule). > > Heh, I actually started to ping you off-list to specifically discuss this option, > but then decided that not waiting those few milliseconds might be worthwhile for > some use cases. I also couldn't quite convince myself that it would only be a few > milliseconds, e.g. if the worker is zapping a fully populated 5-level root, there > are no other tasks scheduled on _its_ CPU, and CONFIG_PREEMPTION=n (which neuters > rwlock_needbreak()). Good point. At some point we're going to have to fix that. > > The other reason I opted for not taking mmu_lock is that, with the persistent roots > approach, I don't think it's actually strictly necessary for kvm_mmu_zap_all_fast() > to invaliate roots while holding mmu_lock for write. Holding slots_lock ensures > that only a single task can be doing invalidations, versus the old model where > putting the last reference to a root could happen just about anywhere. And > allocating a new root and zapping from mmu_noitifiers requires holding mmu_lock for > write, so I _think_ we could getaway with holding mmu_lock for read. Maybe. > > It's largely a moot point since kvm_mmu_zap_all_fast() needs to hold mmu_lock for > write anyways to play nice with the shadow MMU, i.e. I don't expect us to ever > want to pursue a change in this area. But at the same time I was struggling to > write a comment explaining why the VM destruction path "had" to take mmu_lock. Yeah, probably because it really isn't necessary :). It'd be nice to keep around the lockdep assertion though for the other (and future) callers. The cleanest options I can think of are: 1. Pass in a bool "vm_teardown" kvm_tdp_mmu_invalidate_all_roots() and use that to gate the lockdep assertion. 2. Take the mmu_lock for read in kvm_mmu_uninit_tdp_mmu() and pass down bool shared to kvm_tdp_mmu_invalidate_all_roots(). Both would satisfy your concern of not blocking teardown on the async worker and my concern of keeping the lockdep check. I think I prefer (1) since, as you point out, taking the mmu_lock at all is unnecessary.