Re: [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 4 Apr 2024 10:10:26 -0700

On Thu, Apr 04, 2024, David Matlack wrote:
> On Tue, Apr 2, 2024 at 6:50 PM maobibo <maobibo@xxxxxxxxxxx> wrote:
> > > This change eliminates dips in guest performance during live migration
> > > in a 160 vCPU VM when userspace is issuing CLEAR ioctls (tested with
> > > 1GiB and 8GiB CLEARs). Userspace could issue finer-grained CLEARs, which
> > Frequently drop/reacquire mmu_lock will cause userspace migration
> > process issuing CLEAR ioctls to contend with 160 vCPU, migration speed
> > maybe become slower.
> 
> In practice we have not found this to be the case. With this patch
> applied we see a significant improvement in guest workload throughput
> while userspace is issuing CLEAR ioctls without any change to the
> overall migration duration.

...

> In the case of this patch, there doesn't seem to be a trade-off. We
> see an improvement to vCPU performance without any regression in
> migration duration or other metrics.

For x86.  We need to keep in mind that not all architectures have x86's optimization
around dirty logging faults, or around faults in general. E.g. LoongArch's (which
I assume is Bibo Mao's primary interest) kvm_map_page_fast() still acquires mmu_lock.
And if the fault can't be handled in the fast path, KVM will actually acquire
mmu_lock twice (mmu_lock is dropped after the fast-path, then reacquired after
the mmu_seq and fault-in pfn stuff).

So for x86, I think we can comfortably state that this change is a net positive
for all scenarios.  But for other architectures, that might not be the case.
I'm not saying this isn't a good change for other architectures, just that we
don't have relevant data to really know for sure.

Absent performance data for other architectures, which is likely going to be
difficult/slow to get, it might make sense to have this be opt-in to start.  We
could even do it with minimal #ifdeffery, e.g. something like the below would allow
x86 to do whatever locking it wants in kvm_arch_mmu_enable_log_dirty_pt_masked()
(I assume we want to give kvm_get_dirty_log_protect() similar treatment?).

I don't love the idea of adding more arch specific MMU behavior (going the wrong
direction), but it doesn't seem like an unreasonable approach in this case.

diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 86d267db87bb..5eb1ce83f29d 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -66,9 +66,9 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
        if (!memslot || (offset + __fls(mask)) >= memslot->npages)
                return;
 
-       KVM_MMU_LOCK(kvm);
+       KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT(kvm);
        kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset, mask);
-       KVM_MMU_UNLOCK(kvm);
+       KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT(kvm);
 }
 
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d1fd9cb5d037..74ae844e4ed0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2279,7 +2279,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
                dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
                memset(dirty_bitmap_buffer, 0, n);
 
-               KVM_MMU_LOCK(kvm);
+               KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT(kvm);
                for (i = 0; i < n / sizeof(long); i++) {
                        unsigned long mask;
                        gfn_t offset;
@@ -2295,7 +2295,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
                        kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
                                                                offset, mask);
                }
-               KVM_MMU_UNLOCK(kvm);
+               KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT(kvm);
        }
 
        if (flush)
@@ -2390,7 +2390,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
        if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n))
                return -EFAULT;
 
-       KVM_MMU_LOCK(kvm);
+       KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT(kvm);
        for (offset = log->first_page, i = offset / BITS_PER_LONG,
                 n = DIV_ROUND_UP(log->num_pages, BITS_PER_LONG); n--;
             i++, offset += BITS_PER_LONG) {
@@ -2413,7 +2413,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
                                                                offset, mask);
                }
        }
-       KVM_MMU_UNLOCK(kvm);
+       KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT(kvm);
 
        if (flush)
                kvm_flush_remote_tlbs_memslot(kvm, memslot);
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index ecefc7ec51af..39d8b809c303 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -20,6 +20,11 @@
 #define KVM_MMU_UNLOCK(kvm)            spin_unlock(&(kvm)->mmu_lock)
 #endif /* KVM_HAVE_MMU_RWLOCK */
 
+#ifndef KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT
+#define KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT     KVM_MMU_LOCK
+#define KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT   KVM_MMU_UNLOCK
+#endif
+
 kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
                     bool *async, bool write_fault, bool *writable);