Re: [Patch v4 03/18] KVM: x86/mmu: Track count of pages in KVM MMU page caches globally

Vipin Sharma <vipinsh@xxxxxxxxxx> · Thu, 9 Mar 2023 11:52:26 -0800

On Thu, Mar 9, 2023 at 4:52 AM Zhi Wang <zhi.wang.linux@xxxxxxxxx> wrote:
>
> On Thu, 9 Mar 2023 05:18:11 +0000
> Mingwei Zhang <mizhang@xxxxxxxxxx> wrote:
>
> > > >
> > > > 1) Previously mmu_topup_memory_caches() works fine without a lock.
> > > > 2) IMHO I was suspecting if this lock seems affects the parallelization
> > > > of the TDP MMU fault handling.
> > > >
> > > > TDP MMU fault handling is intend to be optimized for parallelization fault
> > > > handling by taking a read lock and operating the page table via atomic
> > > > operations. Multiple fault handling can enter the TDP MMU fault path
> > > > because of read_lock(&vcpu->kvm->mmu_lock) below.
> > > >
> > > > W/ this lock, it seems the part of benefit of parallelization is gone
> > > > because the lock can contend earlier above. Will this cause performance
> > > > regression?
> > >
> > > This is a per vCPU lock, with this lock each vCPU will still be able
> > > to perform parallel fault handling without contending for lock.
> > >
> >
> > I am curious how effective it is by trying to accquiring this per vCPU
> > lock? If a vcpu thread should stay within the (host) kernel (vmx
> > root/non-root) for the vast majority of the time, isn't the shrinker
> > always fail to make any progress?
>
> IMHO the lock is to prevent the faulting path from being disturbed by the
> shrinker. I guess even a vCPU thread stays in the host kernel, the shrinker
> should still be able to harvest the pages from the cache as long as there is
> no faulting flood.

Yes, lock is to prevent the faulting path from being disturbed by the
shrinker. In this new approach, shrinker goes through each vCPU of
each VM alive on the host. All of these vCPUs collectively being in
the fault path while shrinker is invoked seems unlikely.

Let us say we free the cache during the fault path, now when a vCPU
asks a page from the cache, it will dynamically allocate a page via
GFP_ATOMIC which has higher chances of failing if a host is already
under memory pressure. Shrinker by default should be at lower priority
and based on discussions pointed in patch 1, it seems like it was of
not much practical use before either.

>
> I am curious about the effectiveness as well. It would be nice there can be
> some unit tests that people can try by themselves to see the results, like
> when the shrinker isn't triggered, the faulting is still as effective as
> before and when the shrinker is triggered, what would actually happen when
> the system memory is under different pressure. (like how much the faulting
> will be slowed down)
>

Not sure what can be a right test to measure this. My manual testing
was to just run dirty_log_perf_test with and without shrinker and I
didn't notice much difference. I did print some logs to see if
shrinker is getting invoked, caches are freed by shrinker and when VM
is freed to verify page accounting is right with patch 9 of the
series.