Re: [PATCH 2/2] KVM: x86/mmu: Use MMU shrinker to shrink KVM MMU memory caches

Vipin Sharma <vipinsh@xxxxxxxxxx> · Wed, 2 Oct 2024 09:17:06 -0700

On Tue, Oct 1, 2024 at 3:17 PM David Matlack <dmatlack@xxxxxxxxxx> wrote:
>
> On 2024-09-13 02:43 PM, Vipin Sharma wrote:
> > @@ -6997,13 +7007,50 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
> >  static unsigned long mmu_shrink_scan(struct shrinker *shrink,
> >                                    struct shrink_control *sc)
> >  {
> > -     return SHRINK_STOP;
> > +     struct kvm *kvm, *next_kvm, *first_kvm = NULL;
> > +     unsigned long i, freed = 0;
> > +     struct kvm_vcpu *vcpu;
> > +
> > +     mutex_lock(&kvm_lock);
> > +     list_for_each_entry_safe(kvm, next_kvm, &vm_list, vm_list) {
> > +             if (!first_kvm)
> > +                     first_kvm = kvm;
> > +             else if (first_kvm == kvm)
> > +                     break;
> > +
> > +             list_move_tail(&kvm->vm_list, &vm_list);
> > +
> > +             kvm_for_each_vcpu(i, vcpu, kvm) {
> > +                     if (!mutex_trylock(&vcpu->arch.mmu_memory_cache_lock))
> > +                             continue;
> > +                     freed += kvm_mmu_empty_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
> > +                     freed += kvm_mmu_empty_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
> > +                     mutex_unlock(&vcpu->arch.mmu_memory_cache_lock);
> > +                     if (freed >= sc->nr_to_scan)
> > +                             goto out;
>
> Looking at the caller in mm/shrinker.c, sc->nr_to_scan will be <= 128
> (SHRINK_BATCH), which is only enough for 2 vCPUs. So I think the
> shrinker will only ever free 2 vCPU caches of each VM (probably the
> first 2 vCPUs) before reordering the list and moving onto the next VM on
> the next call.
>
> Does that match the behavior you observe?
>
Yes, for dropping cache one time on a big VM, I get multiple calls of
mmu_shrink_scan() where sc->nr_to_scan is at max 128 in each call.

mmu_memory_cache_lock availability will play a role in selecting the
two vCPUs. On a VM where not much faults are happening it will
probably be the first two vCPUs.