Re: [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG

maobibo <maobibo@xxxxxxxxxxx> · Sun, 7 Apr 2024 09:36:57 +0800

On 2024/4/5 上午12:29, David Matlack wrote:
On Tue, Apr 2, 2024 at 6:50 PM maobibo <maobibo@xxxxxxxxxxx> wrote:



On 2024/4/3 上午5:36, David Matlack wrote:
Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG to avoid
blocking other threads (e.g. vCPUs taking page faults) for too long.

Specifically, change kvm_clear_dirty_log_protect() to acquire/release
mmu_lock only when calling kvm_arch_mmu_enable_log_dirty_pt_masked(),
rather than around the entire for loop. This ensures that KVM will only
hold mmu_lock for the time it takes the architecture-specific code to
process up to 64 pages, rather than holding mmu_lock for log->num_pages,
which is controllable by userspace. This also avoids holding mmu_lock
when processing parts of the dirty_bitmap that are zero (i.e. when there
is nothing to clear).

Moving the acquire/release points for mmu_lock should be safe since
dirty_bitmap_buffer is already protected by slots_lock, and dirty_bitmap
is already accessed with atomic_long_fetch_andnot(). And at least on x86
holding mmu_lock doesn't even serialize access to the memslot dirty
bitmap, as vCPUs can call mark_page_dirty_in_slot() without holding
mmu_lock.

This change eliminates dips in guest performance during live migration
in a 160 vCPU VM when userspace is issuing CLEAR ioctls (tested with
1GiB and 8GiB CLEARs). Userspace could issue finer-grained CLEARs, which
Frequently drop/reacquire mmu_lock will cause userspace migration
process issuing CLEAR ioctls to contend with 160 vCPU, migration speed
maybe become slower.

In practice we have not found this to be the case. With this patch
applied we see a significant improvement in guest workload throughput
while userspace is issuing CLEAR ioctls without any change to the
overall migration duration.

For example, here[1] is a graph showing the effect of this patch on
a160 vCPU VM being Live Migrated while running a MySQL workload.
Y-Axis is transactions, blue line is without the patch, red line is
with the patch.

[1] https://drive.google.com/file/d/1zSKtc6wOQqfQrAQ4O9xlFmuyJ2-Iah0k/view
It is great that there is obvious improvement with the workload.

In theory priority of userspace migration thread
should be higher than vcpu thread.

I don't think it's black and white. If prioritizing migration threads
causes vCPU starvation (which is the type of issue this patch is
fixing), then that's probably not the right trade-off. IMO the
ultimate goal of live migration is that it's completely transparent to
the guest workload, i.e. we should aim to minimize guest disruption as
much as possible. A change that migration go 2x as fast but reduces
vCPU performance by 10x during the migration is probably not worth it.
And the reverse is also true, a change that increases vCPU performance
by 10% during migration but makes the migration take 10x longer is
also probably not worth it.
I am not migration expert, IIRC there is migration timeout value in 
cloud stack. If migration timeout reached, migration will be abort.

In the case of this patch, there doesn't seem to be a trade-off. We
see an improvement to vCPU performance without any regression in
migration duration or other metrics.
I will be ok about this patch if there is no any regression in
migration. If x86 migration takes more time with the patch on x86, I 
suggest migration experts give some comments.

Regards
Bibo Mao


Drop and reacquire mmu_lock with 64-pages may be a little too smaller,
in generic it is one huge page size. However it should be decided by
framework maintainer:)

Regards
Bibo Mao
would also reduce contention on mmu_lock, but doing so will increase the
rate of remote TLB flushing. And there's really no reason to punt this
problem to userspace since KVM can just drop and reacquire mmu_lock more
frequently.

Cc: Marc Zyngier <maz@xxxxxxxxxx>
Cc: Oliver Upton <oliver.upton@xxxxxxxxx>
Cc: Tianrui Zhao <zhaotianrui@xxxxxxxxxxx>
Cc: Bibo Mao <maobibo@xxxxxxxxxxx>
Cc: Huacai Chen <chenhuacai@xxxxxxxxxx>
Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
Cc: Anup Patel <anup@xxxxxxxxxxxxxx>
Cc: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx>
Cc: Janosch Frank <frankja@xxxxxxxxxxxxx>
Cc: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx>
---
v2:
   - Rebase onto kvm/queue [Marc]

v1: https://lore.kernel.org/kvm/20231205181645.482037-1-dmatlack@xxxxxxxxxx/

   virt/kvm/kvm_main.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fb49c2a60200..0a8b25a52c15 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2386,7 +2386,6 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
       if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n))
               return -EFAULT;

-     KVM_MMU_LOCK(kvm);
       for (offset = log->first_page, i = offset / BITS_PER_LONG,
                n = DIV_ROUND_UP(log->num_pages, BITS_PER_LONG); n--;
            i++, offset += BITS_PER_LONG) {
@@ -2405,11 +2404,12 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
               */
               if (mask) {
                       flush = true;
+                     KVM_MMU_LOCK(kvm);
                       kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
                                                               offset, mask);
+                     KVM_MMU_UNLOCK(kvm);
               }
       }
-     KVM_MMU_UNLOCK(kvm);

       if (flush)
               kvm_flush_remote_tlbs_memslot(kvm, memslot);

base-commit: 9bc60f733839ab6fcdde0d0b15cbb486123e6402