Re: [Patch v3 1/9] KVM: x86/mmu: Repurpose KVM MMU shrinker to purge shadow page caches

David Matlack <dmatlack@xxxxxxxxxx> · Thu, 29 Dec 2022 13:15:44 -0800

On Wed, Dec 28, 2022 at 02:07:49PM -0800, Vipin Sharma wrote:
> On Tue, Dec 27, 2022 at 10:37 AM Ben Gardon <bgardon@xxxxxxxxxx> wrote:
> > On Wed, Dec 21, 2022 at 6:35 PM Vipin Sharma <vipinsh@xxxxxxxxxx> wrote:
> > >
> > > Tested this change by running dirty_log_perf_test while dropping cache
> > > via "echo 2 > /proc/sys/vm/drop_caches" at 1 second interval
> > > continuously. There were WARN_ON(!mc->nobjs) messages printed in kernel
> > > logs from kvm_mmu_memory_cache_alloc(), which is expected.
> >
> > Oh, that's not a good thing. I don't think we want to be hitting those
> > warnings. For one, kernel warnings should not be expected behavior,
> > probably for many reasons, but at least because Syzbot will find it.
> > In this particular case, we don't want to hit that because in that
> > case we'll try to do a GFP_ATOMIC, which can fail, and if it fails,
> > we'll BUG:
> >
> > void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
> > {
> >         void *p;
> >
> >         if (WARN_ON(!mc->nobjs))
> >                 p = mmu_memory_cache_alloc_obj(mc, GFP_ATOMIC | __GFP_ACCOUNT);
> >         else
> >                 p = mc->objects[--mc->nobjs];
> >         BUG_ON(!p);
> >         return p;
> > }
> >
> > Perhaps the risk of actually panicking is small, but it probably
> > indicates that we need better error handling around failed allocations
> > from the cache.
> > Or, the slightly less elegant approach might be to just hold the cache
> > lock around the cache topup and use of pages from the cache, but
> > adding better error handling would probably be cleaner.
> 
> I was counting on the fact that shrinker will ideally run only in
> extreme cases, i.e. host is running on low memory. So, this WARN_ON
> will only be rarely used. I was not aware of Syzbot, it seems like it
> will be a concern if it does this kind of testing.

In an extreme low-memory situation, forcing vCPUS to do GFP_ATOMIC
allocations to handle page faults is risky. Plus it's a waste of time to
free that memory since it's just going to get immediately reallocated.

> 
> I thought about keeping a mutex, taking it during topup and releasing
> it after the whole operation is done but I stopped it as the duration
> of holding mutex will be long and might block the memory shrinker
> longer. I am not sure though, if this is a valid concern.

Use mutex_trylock() to skip any vCPUs that are currently handling page
faults.