On Thu, 9 Mar 2023 05:18:11 +0000 Mingwei Zhang <mizhang@xxxxxxxxxx> wrote: > > > > > > 1) Previously mmu_topup_memory_caches() works fine without a lock. > > > 2) IMHO I was suspecting if this lock seems affects the parallelization > > > of the TDP MMU fault handling. > > > > > > TDP MMU fault handling is intend to be optimized for parallelization fault > > > handling by taking a read lock and operating the page table via atomic > > > operations. Multiple fault handling can enter the TDP MMU fault path > > > because of read_lock(&vcpu->kvm->mmu_lock) below. > > > > > > W/ this lock, it seems the part of benefit of parallelization is gone > > > because the lock can contend earlier above. Will this cause performance > > > regression? > > > > This is a per vCPU lock, with this lock each vCPU will still be able > > to perform parallel fault handling without contending for lock. > > > > I am curious how effective it is by trying to accquiring this per vCPU > lock? If a vcpu thread should stay within the (host) kernel (vmx > root/non-root) for the vast majority of the time, isn't the shrinker > always fail to make any progress? IMHO the lock is to prevent the faulting path from being disturbed by the shrinker. I guess even a vCPU thread stays in the host kernel, the shrinker should still be able to harvest the pages from the cache as long as there is no faulting flood. I am curious about the effectiveness as well. It would be nice there can be some unit tests that people can try by themselves to see the results, like when the shrinker isn't triggered, the faulting is still as effective as before and when the shrinker is triggered, what would actually happen when the system memory is under different pressure. (like how much the faulting will be slowed down)