On Fri, Aug 25, 2023 at 02:34:30PM -0700, Sean Christopherson wrote: > On Fri, Jul 14, 2023, Yan Zhao wrote: > > Convert kvm_zap_gfn_range() from holding mmu_lock for write to holding for > > read in TDP MMU and allow zapping of non-leaf SPTEs of level <= 1G. > > TLB flushes are executed/requested within tdp_mmu_zap_spte_atomic() guarded > > by RCU lock. > > > > GFN zap can be super slow if mmu_lock is held for write when there are > > contentions. In worst cases, huge cpu cycles are spent on yielding GFN by > > GFN, i.e. the loop of "check and flush tlb -> drop rcu lock -> > > drop mmu_lock -> cpu_relax() -> take mmu_lock -> take rcu lock" are entered > > for every GFN. > > Contentions can either from concurrent zaps holding mmu_lock for write or > > from tdp_mmu_map() holding mmu_lock for read. > > The lock contention should go away with a pre-check[*], correct? That's a more Yes, I think so, though I don't have time to verify it yet. > complete solution too, in that it also avoids lock contention for the shadow MMU, > which presumably suffers the same problem (I don't see anything that would prevent > it from yielding). > > If we do want to zap with mmu_lock held for read, I think we should convert > kvm_tdp_mmu_zap_leafs() and all its callers to run under read, because unless I'm > missing something, the rules are the same regardless of _why_ KVM is zapping, e.g. > the zap needs to be protected by mmu_invalidate_in_progress, which ensures no other > tasks will race to install SPTEs that are supposed to be zapped. Yes. I did't do that to the unmap path was only because I don't want to make a big code change. The write lock in kvm_unmap_gfn_range() path is taken in arch-agnostic code, which is not easy to change, right? > > If you post a version of this patch that converts kvm_tdp_mmu_zap_leafs(), please > post it as a standalone patch. At a glance it doesn't have any dependencies on the > MTRR changes, and I don't want this type of changed buried at the end of a series > that is for a fairly niche setup. This needs a lot of scrutiny to make sure zapping > under read really is safe Given the pre-check patch should work, do you think it's still worthwhile to do this convertion? > > [*] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@xxxxxxxxxx > > > After converting to hold mmu_lock for read, there will be less contentions > > detected and retaking mmu_lock for read is also faster. There's no need to > > flush TLB before dropping mmu_lock when there're contentions as SPTEs have > > been zapped atomically and TLBs are flushed/flush requested immediately > > within RCU lock. > > In order to reduce TLB flush count, non-leaf SPTEs not greater than 1G > > level are allowed to be zapped if their ranges are fully covered in the > > gfn zap range. > > > > Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > > ---