Re: [PATCH 2/2] KVM: x86: introduce cache configurations for previous CR3s

zhuangel570 <zhuangel570@xxxxxxxxx> · Wed, 30 Oct 2024 20:51:16 +0800

On Wed, Oct 30, 2024 at 4:38 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Tue, Oct 29, 2024, Yong He wrote:
> > From: Yong He <alexyonghe@xxxxxxxxxxx>
> >
> > Introduce prev_roots_num param, so that we use more cache of
> > previous CR3/root_hpa pairs, which help us to reduce shadow
> > page table evict and rebuild overhead.
> >
> > Signed-off-by: Yong He <alexyonghe@xxxxxxxxxxx>
> > ---
>
> ...
>
> > +uint __read_mostly prev_roots_num = KVM_MMU_NUM_PREV_ROOTS;
> > +EXPORT_SYMBOL_GPL(prev_roots_num);
> > +module_param_cb(prev_roots_num, &prev_roots_num_ops,
> > +             &prev_roots_num, 0644);
>
> Allowing the variable to be changed while KVM is running is unsafe.

Sorry, I have no intention to revise prev_roots_num, changing it 0444 is
simpler and can improve performance, will update in next version.

>
> I also think a module param is the wrong way to try to allow for bigger caches.
> The caches themselves are relatively cheap, at 16 bytes per entry.  And I doubt
> the cost of searching a larger cache in fast_pgd_switch() would have a measurable
> impact, since the most recently used roots will be at the front of the cache,
> i.e. only near-misses and misses will be affected.

Maybe the context switch test could see some result, when the processes in guest
are 7 or 8 (all cache misses), nearly no performance drop.

>
> The only potential downside to larger caches I can think of, is that keeping
> root_count elevated would make it more difficult to reclaim shadow pages from
> roots that are no longer relevant to the guest.  kvm_mmu_zap_oldest_mmu_pages()
> in particular would refuse to reclaim roots.  That shouldn't be problematic for
> legacy shadow paging, because KVM doesn't recursively zap shadow pages.  But for
> nested TDP, mmu_page_zap_pte() frees the entire tree, in the common case that
> child SPTEs aren't shared across multiple trees (common in legacy shadow paging,
> extremely uncommon in nested TDP).
>
> And for the nested TDP issue, if it's actually a problem, I would *love* to
> solve that problem by making KVM's forced reclaim more sophisticated.  E.g. one
> idea would be to kick all vCPUs if the maximum number of pages has been reached,
> have each vCPU purge old roots from prev_roots, and then reclaim unused roots.
> It would be a bit more complicated than that, as KVM would need a way to ensure
> forward progress, e.g. if the shadow pages limit has been reach with a single
> root.  But even then, kvm_mmu_zap_oldest_mmu_pages() could be made a _lot_ smarter.

I not very familiar with TDP on TDP.
I think you mean force free cached roots in kvm_mmu_zap_oldest_mmu_pages() when
no mmu pages could be zapped. Such as kick all VCPUs and purge cached roots.

>
> TL;DR: what if we simply bump the number of cached roots to ~16?

I set the number to 11 because the PCID in guest kernel is 6
(11+current=12), when
there are more than 6 processes in guest, the PCID will be reused,
then cached roots
will not easily to hit.
The context switch case shows no performance gain when process are 7 and 8.