Re: [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG

David Matlack <dmatlack@xxxxxxxxxx> · Thu, 4 Apr 2024 09:29:18 -0700

On Tue, Apr 2, 2024 at 6:50 PM maobibo <maobibo@xxxxxxxxxxx> wrote:
>
>
>
> On 2024/4/3 上午5:36, David Matlack wrote:
> > Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG to avoid
> > blocking other threads (e.g. vCPUs taking page faults) for too long.
> >
> > Specifically, change kvm_clear_dirty_log_protect() to acquire/release
> > mmu_lock only when calling kvm_arch_mmu_enable_log_dirty_pt_masked(),
> > rather than around the entire for loop. This ensures that KVM will only
> > hold mmu_lock for the time it takes the architecture-specific code to
> > process up to 64 pages, rather than holding mmu_lock for log->num_pages,
> > which is controllable by userspace. This also avoids holding mmu_lock
> > when processing parts of the dirty_bitmap that are zero (i.e. when there
> > is nothing to clear).
> >
> > Moving the acquire/release points for mmu_lock should be safe since
> > dirty_bitmap_buffer is already protected by slots_lock, and dirty_bitmap
> > is already accessed with atomic_long_fetch_andnot(). And at least on x86
> > holding mmu_lock doesn't even serialize access to the memslot dirty
> > bitmap, as vCPUs can call mark_page_dirty_in_slot() without holding
> > mmu_lock.
> >
> > This change eliminates dips in guest performance during live migration
> > in a 160 vCPU VM when userspace is issuing CLEAR ioctls (tested with
> > 1GiB and 8GiB CLEARs). Userspace could issue finer-grained CLEARs, which
> Frequently drop/reacquire mmu_lock will cause userspace migration
> process issuing CLEAR ioctls to contend with 160 vCPU, migration speed
> maybe become slower.

In practice we have not found this to be the case. With this patch
applied we see a significant improvement in guest workload throughput
while userspace is issuing CLEAR ioctls without any change to the
overall migration duration.

For example, here[1] is a graph showing the effect of this patch on
a160 vCPU VM being Live Migrated while running a MySQL workload.
Y-Axis is transactions, blue line is without the patch, red line is
with the patch.

[1] https://drive.google.com/file/d/1zSKtc6wOQqfQrAQ4O9xlFmuyJ2-Iah0k/view

> In theory priority of userspace migration thread
> should be higher than vcpu thread.

I don't think it's black and white. If prioritizing migration threads
causes vCPU starvation (which is the type of issue this patch is
fixing), then that's probably not the right trade-off. IMO the
ultimate goal of live migration is that it's completely transparent to
the guest workload, i.e. we should aim to minimize guest disruption as
much as possible. A change that migration go 2x as fast but reduces
vCPU performance by 10x during the migration is probably not worth it.
And the reverse is also true, a change that increases vCPU performance
by 10% during migration but makes the migration take 10x longer is
also probably not worth it.

In the case of this patch, there doesn't seem to be a trade-off. We
see an improvement to vCPU performance without any regression in
migration duration or other metrics.

>
> Drop and reacquire mmu_lock with 64-pages may be a little too smaller,
> in generic it is one huge page size. However it should be decided by
> framework maintainer:)
>
> Regards
> Bibo Mao
> > would also reduce contention on mmu_lock, but doing so will increase the
> > rate of remote TLB flushing. And there's really no reason to punt this
> > problem to userspace since KVM can just drop and reacquire mmu_lock more
> > frequently.
> >
> > Cc: Marc Zyngier <maz@xxxxxxxxxx>
> > Cc: Oliver Upton <oliver.upton@xxxxxxxxx>
> > Cc: Tianrui Zhao <zhaotianrui@xxxxxxxxxxx>
> > Cc: Bibo Mao <maobibo@xxxxxxxxxxx>
> > Cc: Huacai Chen <chenhuacai@xxxxxxxxxx>
> > Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
> > Cc: Anup Patel <anup@xxxxxxxxxxxxxx>
> > Cc: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx>
> > Cc: Janosch Frank <frankja@xxxxxxxxxxxxx>
> > Cc: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
> > Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
> > Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx>
> > ---
> > v2:
> >   - Rebase onto kvm/queue [Marc]
> >
> > v1: https://lore.kernel.org/kvm/20231205181645.482037-1-dmatlack@xxxxxxxxxx/
> >
> >   virt/kvm/kvm_main.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index fb49c2a60200..0a8b25a52c15 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2386,7 +2386,6 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
> >       if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n))
> >               return -EFAULT;
> >
> > -     KVM_MMU_LOCK(kvm);
> >       for (offset = log->first_page, i = offset / BITS_PER_LONG,
> >                n = DIV_ROUND_UP(log->num_pages, BITS_PER_LONG); n--;
> >            i++, offset += BITS_PER_LONG) {
> > @@ -2405,11 +2404,12 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
> >               */
> >               if (mask) {
> >                       flush = true;
> > +                     KVM_MMU_LOCK(kvm);
> >                       kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
> >                                                               offset, mask);
> > +                     KVM_MMU_UNLOCK(kvm);
> >               }
> >       }
> > -     KVM_MMU_UNLOCK(kvm);
> >
> >       if (flush)
> >               kvm_flush_remote_tlbs_memslot(kvm, memslot);
> >
> > base-commit: 9bc60f733839ab6fcdde0d0b15cbb486123e6402
> >
>