Re: [PATCH v2] KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 17 Nov 2022 16:39:41 +0000

On Thu, Nov 17, 2022, Paolo Bonzini wrote:
> On 11/7/22 22:21, Sean Christopherson wrote:
> > 
> > Hmm, and the memslot heuristic doesn't address the recovery worker holding mmu_lock
> > for write.  On a non-preemptible kernel, rwlock_needbreak() is always false, e.g.
> > the worker won't yield to vCPUs that are trying to handle non-fast page faults.
> > The worker should eventually reach steady state by unaccounting everything, but
> > that might take a while.
> 
> I'm not sure what you mean here?  The recovery worker will still decrease
> to_zap by 1 on every unaccounted NX hugepage, and go to sleep after it
> reaches 0.

Right, what I'm saying is that this approach is still sub-optimal because it does
all that work will holding mmu_lock for write.  

> Also, David's test used a 10-second halving time for the recovery thread.
> With the 1 hour time the effect would Perhaps the 1 hour time used by
> default by KVM is overly conservative, but 1% over 10 seconds is certainly a
> lot larger an effect, than 1% over 1 hour.

It's not the CPU usage I'm thinking of, it's the unnecessary blockage of MMU
operations on other tasks/vCPUs.  Given that this is related to dirty logging,
odds are very good that there will be a variety of operations in flight, e.g.
KVM_GET_DIRTY_LOG.  If the recovery ratio is aggressive, and/or there are a lot
of pages to recover, the recovery thread could hold mmu_lock until a reched is
needed.