On Fri, Oct 04, 2024, Vipin Sharma wrote: > Use MMU shrinker to iterate through all the vCPUs of all the VMs and > free pages allocated in MMU memory caches. Protect cache allocation in > page fault and MMU load path from MMU shrinker by using a per vCPU > mutex. In MMU shrinker, move the iterated VM to the end of the VMs list > so that the pain of emptying cache spread among other VMs too. > > The specific caches to empty are mmu_shadow_page_cache and > mmu_shadowed_info_cache as these caches store whole pages. Emptying them > will give more impact to shrinker compared to other caches like > mmu_pte_list_desc_cache{} and mmu_page_header_cache{} > > Holding per vCPU mutex lock ensures that a vCPU doesn't get surprised > by finding its cache emptied after filling them up for page table > allocations during page fault handling and MMU load operation. Per vCPU > mutex also makes sure there is only race between MMU shrinker and all > other vCPUs. This should result in very less contention. > > Signed-off-by: Vipin Sharma <vipinsh@xxxxxxxxxx> > --- ... > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 213e46b55dda2..8e2935347615d 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4524,29 +4524,33 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > if (r != RET_PF_INVALID) > return r; > > + mutex_lock(&vcpu->arch.mmu_memory_cache_lock); > r = mmu_topup_memory_caches(vcpu, false); > if (r) > - return r; > + goto out_mmu_memory_cache_unlock; > > r = kvm_faultin_pfn(vcpu, fault, ACC_ALL); > if (r != RET_PF_CONTINUE) > - return r; > + goto out_mmu_memory_cache_unlock; > > r = RET_PF_RETRY; > write_lock(&vcpu->kvm->mmu_lock); > > if (is_page_fault_stale(vcpu, fault)) > - goto out_unlock; > + goto out_mmu_unlock; > > r = make_mmu_pages_available(vcpu); > if (r) > - goto out_unlock; > + goto out_mmu_unlock; > > r = direct_map(vcpu, fault); > > -out_unlock: > +out_mmu_unlock: > write_unlock(&vcpu->kvm->mmu_lock); > kvm_release_pfn_clean(fault->pfn); > +out_mmu_memory_cache_unlock: > + mutex_unlock(&vcpu->arch.mmu_memory_cache_lock); I've been thinking about this patch on and off for the past few weeks, and every time I come back to it I can't shake the feeling that we came up with a clever solution for a problem that doesn't exist. I can't recall a single complaint about KVM consuming an unreasonable amount of memory for page tables. In fact, the only time I can think of where the code in question caused problems was when I unintentionally inverted the iterator and zapped the newest SPs instead of the oldest SPs. So, I'm leaning more and more toward simply removing the shrinker integration.