On Wed, Oct 30, 2024 at 4:38 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Tue, Oct 29, 2024, Yong He wrote: > > From: Yong He <alexyonghe@xxxxxxxxxxx> > > > > Introduce prev_roots_num param, so that we use more cache of > > previous CR3/root_hpa pairs, which help us to reduce shadow > > page table evict and rebuild overhead. > > > > Signed-off-by: Yong He <alexyonghe@xxxxxxxxxxx> > > --- > > ... > > > +uint __read_mostly prev_roots_num = KVM_MMU_NUM_PREV_ROOTS; > > +EXPORT_SYMBOL_GPL(prev_roots_num); > > +module_param_cb(prev_roots_num, &prev_roots_num_ops, > > + &prev_roots_num, 0644); > > Allowing the variable to be changed while KVM is running is unsafe. Sorry, I have no intention to revise prev_roots_num, changing it 0444 is simpler and can improve performance, will update in next version. > > I also think a module param is the wrong way to try to allow for bigger caches. > The caches themselves are relatively cheap, at 16 bytes per entry. And I doubt > the cost of searching a larger cache in fast_pgd_switch() would have a measurable > impact, since the most recently used roots will be at the front of the cache, > i.e. only near-misses and misses will be affected. Maybe the context switch test could see some result, when the processes in guest are 7 or 8 (all cache misses), nearly no performance drop. > > The only potential downside to larger caches I can think of, is that keeping > root_count elevated would make it more difficult to reclaim shadow pages from > roots that are no longer relevant to the guest. kvm_mmu_zap_oldest_mmu_pages() > in particular would refuse to reclaim roots. That shouldn't be problematic for > legacy shadow paging, because KVM doesn't recursively zap shadow pages. But for > nested TDP, mmu_page_zap_pte() frees the entire tree, in the common case that > child SPTEs aren't shared across multiple trees (common in legacy shadow paging, > extremely uncommon in nested TDP). > > And for the nested TDP issue, if it's actually a problem, I would *love* to > solve that problem by making KVM's forced reclaim more sophisticated. E.g. one > idea would be to kick all vCPUs if the maximum number of pages has been reached, > have each vCPU purge old roots from prev_roots, and then reclaim unused roots. > It would be a bit more complicated than that, as KVM would need a way to ensure > forward progress, e.g. if the shadow pages limit has been reach with a single > root. But even then, kvm_mmu_zap_oldest_mmu_pages() could be made a _lot_ smarter. I not very familiar with TDP on TDP. I think you mean force free cached roots in kvm_mmu_zap_oldest_mmu_pages() when no mmu pages could be zapped. Such as kick all VCPUs and purge cached roots. > > TL;DR: what if we simply bump the number of cached roots to ~16? I set the number to 11 because the PCID in guest kernel is 6 (11+current=12), when there are more than 6 processes in guest, the PCID will be reused, then cached roots will not easily to hit. The context switch case shows no performance gain when process are 7 and 8.