Re: [PATCH v2] KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 17 Nov 2022 19:07:30 +0000

On Thu, Nov 17, 2022, David Matlack wrote:
> On Thu, Nov 17, 2022 at 9:04 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Thu, Nov 17, 2022, Paolo Bonzini wrote:
> > > On 11/17/22 17:39, Sean Christopherson wrote:
> > > > Right, what I'm saying is that this approach is still sub-optimal because it does
> > > > all that work will holding mmu_lock for write.
> > > >
> > > > > Also, David's test used a 10-second halving time for the recovery thread.
> > > > > With the 1 hour time the effect would Perhaps the 1 hour time used by
> > > > > default by KVM is overly conservative, but 1% over 10 seconds is certainly a
> > > > > lot larger an effect, than 1% over 1 hour.
> > > >
> > > > It's not the CPU usage I'm thinking of, it's the unnecessary blockage of MMU
> > > > operations on other tasks/vCPUs.  Given that this is related to dirty logging,
> > > > odds are very good that there will be a variety of operations in flight, e.g.
> > > > KVM_GET_DIRTY_LOG.  If the recovery ratio is aggressive, and/or there are a lot
> > > > of pages to recover, the recovery thread could hold mmu_lock until a reched is
> > > > needed.
> > >
> > > If you need that, you need to configure your kernel to be preemptible, at
> > > least voluntarily.  That's in general a good idea for KVM, given its
> > > rwlock-happiness.
> >
> > IMO, it's not that simple.  We always "need" better live migration performance,
> > but we don't need/want preemption in general.
> >
> > > And the patch is not making it worse, is it?  Yes, you have to look up the
> > > memslot, but the work to do that should be less than what you save by not
> > > zapping the page.
> >
> > Yes, my objection  is that we're adding a heuristic to guess at userspace's
> > intentions (it's probably a good guess, but still) and the resulting behavior isn't
> > optimal.  Giving userspace an explicit knob seems straightforward and would address
> > both of those issues, why not go that route?
> 
> In this case KVM knows that zapping dirty-tracked pages is completely
> useless, regardless of what userspace is doing, so there's no
> guessing.
> 
> A userspace knob requires userspace guess at KVM's implementation
> details. e.g. KVM could theoretically support faulting in read
> accesses and execute accesses as write-protected huge pages during
> dirty logging. Or KVM might supporting 2MiB+ dirty logging. In both
> cases a binary userspace knob might not be the best fit.

Hmm, maybe.  If userspace is migrating a VM, zapping shadow pages to try and
allow NX huge pages may be undesirable irrespective of KVM internals.  E.g. even
if KVM supports 2MiB dirty logging, zapping an entire 2MiB region of guest memory
to _maybe_ install a huge page while the guest is already likely experiencing
jitter is probably a net negative.

I do agree that they are somewhat complimentary though, e.g. even if userspace is
aware of the per-VM knob, userspace might want to allow reaping during migration
for whatever reason.  Or conversely, userspace might want to temporarily disable
reaping for reasons completely unrelated to migration.

> I agree that, even with this patch, KVM is still suboptimal because it
> is holding the MMU lock to do all these checks. But this patch should
> at least be a step in the right direction for reducing customer
> hiccups during live migration.

True.

> Also as for the CPU usage, I did a terrible job of explaining the
> impact. It's a 1% increase over the current usage, but the current
> usage is extremely low even with my way overly aggressive settings.
> Specifically, the CPU usage of the NX recovery worker increased from
> 0.73 CPU-seconds to 0.74 CPU-seconds over a 2.5 minute runtime.

Heh, that does change things a bit.

Objection officially withdrawn, allowing userspace to turn off the reaper can be
done on top if it actually adds value.