Re: [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into TDP and non-TDP flow

Vipin Sharma <vipinsh@xxxxxxxxxx> · Mon, 19 Aug 2024 14:57:38 -0700

On 2024-08-19 11:31:22, Sean Christopherson wrote:
> On Mon, Aug 19, 2024, David Matlack wrote:
> > On Mon, Aug 19, 2024 at 10:20 AM Vipin Sharma <vipinsh@xxxxxxxxxx> wrote:
> > >
> > > On 2024-08-16 16:29:11, Sean Christopherson wrote:
> > > > Why not just use separate lists?
> > >
> > > Before this patch, NX huge page recovery calculates "to_zap" and then it
> > > zaps first "to_zap" pages from the common list. This series is trying to
> > > maintain that invarient.
> 
> I wouldn't try to maintain any specific behavior in the existing code, AFAIK it's
> 100% arbitrary and wasn't written with any meaningful sophistication.  E.g. FIFO
> is little more than blindly zapping pages and hoping for the best.
> 
> > > If we use two separate lists then we have to decide how many pages
> > > should be zapped from TDP MMU and shadow MMU list. Few options I can
> > > think of:
> > >
> > > 1. Zap "to_zap" pages from both TDP MMU and shadow MMU list separately.
> > >    Effectively, this might double the work for recovery thread.
> > > 2. Try zapping "to_zap" page from one list and if there are not enough
> > >    pages to zap then zap from the other list. This can cause starvation.
> > > 3. Do half of "to_zap" from one list and another half from the other
> > >    list. This can lead to situations where only half work is being done
> > >    by the recovery worker thread.
> > >
> > > Option (1) above seems more reasonable to me.
> > 
> > I vote each should zap 1/nx_huge_pages_recovery_ratio of their
> > respective list. i.e. Calculate to_zap separately for each list.
> 
> Yeah, I don't have a better idea since this is effectively a quick and dirty
> solution to reduce guest jitter.  We can at least add a counter so that the zap
> is proportional to the number of pages on each list, e.g. this, and then do the
> necessary math in the recovery paths.
> 

Okay, I will work on v2 which creates two separate lists for NX huge
pages. Use specific counter for TDP MMU and zap based on that.