Re: [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG

maobibo <maobibo@xxxxxxxxxxx> · Sun, 7 Apr 2024 10:25:51 +0800

On 2024/4/5 上午1:10, Sean Christopherson wrote:
On Thu, Apr 04, 2024, David Matlack wrote:
On Tue, Apr 2, 2024 at 6:50 PM maobibo <maobibo@xxxxxxxxxxx> wrote:
This change eliminates dips in guest performance during live migration
in a 160 vCPU VM when userspace is issuing CLEAR ioctls (tested with
1GiB and 8GiB CLEARs). Userspace could issue finer-grained CLEARs, which
Frequently drop/reacquire mmu_lock will cause userspace migration
process issuing CLEAR ioctls to contend with 160 vCPU, migration speed
maybe become slower.

In practice we have not found this to be the case. With this patch
applied we see a significant improvement in guest workload throughput
while userspace is issuing CLEAR ioctls without any change to the
overall migration duration.

...

In the case of this patch, there doesn't seem to be a trade-off. We
see an improvement to vCPU performance without any regression in
migration duration or other metrics.

For x86.  We need to keep in mind that not all architectures have x86's optimization
around dirty logging faults, or around faults in general. E.g. LoongArch's (which
I assume is Bibo Mao's primary interest) kvm_map_page_fast() still acquires mmu_lock.
And if the fault can't be handled in the fast path, KVM will actually acquire
mmu_lock twice (mmu_lock is dropped after the fast-path, then reacquired after
the mmu_seq and fault-in pfn stuff).
yes, there is no lock in function fast_page_fault on x86, however mmu 
spinlock is held on fast path on LoongArch. I do not notice this, 
originally I think that there is read_lock on x86 fast path for pte 
modification, write lock about page table modification.

So for x86, I think we can comfortably state that this change is a net positive
for all scenarios.  But for other architectures, that might not be the case.
I'm not saying this isn't a good change for other architectures, just that we
don't have relevant data to really know for sure.
From the code there is contention between migration assistant thread 
and vcpu thread in slow path where huge page need be split if memslot is 
dirty log enabled, however there is no contention between migration 
assistant thread and vcpu fault fast path. If it is right, negative 
effect is small on x86.


Absent performance data for other architectures, which is likely going to be
difficult/slow to get, it might make sense to have this be opt-in to start.  We
could even do it with minimal #ifdeffery, e.g. something like the below would allow
x86 to do whatever locking it wants in kvm_arch_mmu_enable_log_dirty_pt_masked()
(I assume we want to give kvm_get_dirty_log_protect() similar treatment?).

I don't love the idea of adding more arch specific MMU behavior (going the wrong
direction), but it doesn't seem like an unreasonable approach in this case.
No special modification for this, it is a little strange. LoongArch page 
fault fast path can improve later.

Regards
Bibo Mao

diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 86d267db87bb..5eb1ce83f29d 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -66,9 +66,9 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
         if (!memslot || (offset + __fls(mask)) >= memslot->npages)
                 return;
  
-       KVM_MMU_LOCK(kvm);
+       KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT(kvm);
         kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset, mask);
-       KVM_MMU_UNLOCK(kvm);
+       KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT(kvm);
  }
  
  int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d1fd9cb5d037..74ae844e4ed0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2279,7 +2279,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
                 dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
                 memset(dirty_bitmap_buffer, 0, n);
  
-               KVM_MMU_LOCK(kvm);
+               KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT(kvm);
                 for (i = 0; i < n / sizeof(long); i++) {
                         unsigned long mask;
                         gfn_t offset;
@@ -2295,7 +2295,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
                         kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
                                                                 offset, mask);
                 }
-               KVM_MMU_UNLOCK(kvm);
+               KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT(kvm);
         }
  
         if (flush)
@@ -2390,7 +2390,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
         if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n))
                 return -EFAULT;
  
-       KVM_MMU_LOCK(kvm);
+       KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT(kvm);
         for (offset = log->first_page, i = offset / BITS_PER_LONG,
                  n = DIV_ROUND_UP(log->num_pages, BITS_PER_LONG); n--;
              i++, offset += BITS_PER_LONG) {
@@ -2413,7 +2413,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
                                                                 offset, mask);
                 }
         }
-       KVM_MMU_UNLOCK(kvm);
+       KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT(kvm);
  
         if (flush)
                 kvm_flush_remote_tlbs_memslot(kvm, memslot);
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index ecefc7ec51af..39d8b809c303 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -20,6 +20,11 @@
  #define KVM_MMU_UNLOCK(kvm)            spin_unlock(&(kvm)->mmu_lock)
  #endif /* KVM_HAVE_MMU_RWLOCK */
  
+#ifndef KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT
+#define KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT     KVM_MMU_LOCK
+#define KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT   KVM_MMU_UNLOCK
+#endif
+
  kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
                      bool *async, bool write_fault, bool *writable);