Re: [PATCH v2] KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 17 Nov 2022 17:03:57 +0000

On Thu, Nov 17, 2022, Paolo Bonzini wrote:
> On 11/17/22 17:39, Sean Christopherson wrote:
> > Right, what I'm saying is that this approach is still sub-optimal because it does
> > all that work will holding mmu_lock for write.
> > 
> > > Also, David's test used a 10-second halving time for the recovery thread.
> > > With the 1 hour time the effect would Perhaps the 1 hour time used by
> > > default by KVM is overly conservative, but 1% over 10 seconds is certainly a
> > > lot larger an effect, than 1% over 1 hour.
> > 
> > It's not the CPU usage I'm thinking of, it's the unnecessary blockage of MMU
> > operations on other tasks/vCPUs.  Given that this is related to dirty logging,
> > odds are very good that there will be a variety of operations in flight, e.g.
> > KVM_GET_DIRTY_LOG.  If the recovery ratio is aggressive, and/or there are a lot
> > of pages to recover, the recovery thread could hold mmu_lock until a reched is
> > needed.
> 
> If you need that, you need to configure your kernel to be preemptible, at
> least voluntarily.  That's in general a good idea for KVM, given its
> rwlock-happiness.

IMO, it's not that simple.  We always "need" better live migration performance,
but we don't need/want preemption in general.

> And the patch is not making it worse, is it?  Yes, you have to look up the
> memslot, but the work to do that should be less than what you save by not
> zapping the page.

Yes, my objection  is that we're adding a heuristic to guess at userspace's
intentions (it's probably a good guess, but still) and the resulting behavior isn't
optimal.  Giving userspace an explicit knob seems straightforward and would address
both of those issues, why not go that route?

> Perhaps we could add to struct kvm a counter of the number of log-pages
> memslots.  While a correct value would only be readable with slots_lock
> taken, the NX recovery thread is content with an approximate value and
> therefore can retrieve it with READ_ONCE or atomic_read().
> 
> Paolo
>