On Tue, Apr 2, 2024 at 6:50 PM maobibo <maobibo@xxxxxxxxxxx> wrote: > > > > On 2024/4/3 上午5:36, David Matlack wrote: > > Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG to avoid > > blocking other threads (e.g. vCPUs taking page faults) for too long. > > > > Specifically, change kvm_clear_dirty_log_protect() to acquire/release > > mmu_lock only when calling kvm_arch_mmu_enable_log_dirty_pt_masked(), > > rather than around the entire for loop. This ensures that KVM will only > > hold mmu_lock for the time it takes the architecture-specific code to > > process up to 64 pages, rather than holding mmu_lock for log->num_pages, > > which is controllable by userspace. This also avoids holding mmu_lock > > when processing parts of the dirty_bitmap that are zero (i.e. when there > > is nothing to clear). > > > > Moving the acquire/release points for mmu_lock should be safe since > > dirty_bitmap_buffer is already protected by slots_lock, and dirty_bitmap > > is already accessed with atomic_long_fetch_andnot(). And at least on x86 > > holding mmu_lock doesn't even serialize access to the memslot dirty > > bitmap, as vCPUs can call mark_page_dirty_in_slot() without holding > > mmu_lock. > > > > This change eliminates dips in guest performance during live migration > > in a 160 vCPU VM when userspace is issuing CLEAR ioctls (tested with > > 1GiB and 8GiB CLEARs). Userspace could issue finer-grained CLEARs, which > Frequently drop/reacquire mmu_lock will cause userspace migration > process issuing CLEAR ioctls to contend with 160 vCPU, migration speed > maybe become slower. In practice we have not found this to be the case. With this patch applied we see a significant improvement in guest workload throughput while userspace is issuing CLEAR ioctls without any change to the overall migration duration. For example, here[1] is a graph showing the effect of this patch on a160 vCPU VM being Live Migrated while running a MySQL workload. Y-Axis is transactions, blue line is without the patch, red line is with the patch. [1] https://drive.google.com/file/d/1zSKtc6wOQqfQrAQ4O9xlFmuyJ2-Iah0k/view > In theory priority of userspace migration thread > should be higher than vcpu thread. I don't think it's black and white. If prioritizing migration threads causes vCPU starvation (which is the type of issue this patch is fixing), then that's probably not the right trade-off. IMO the ultimate goal of live migration is that it's completely transparent to the guest workload, i.e. we should aim to minimize guest disruption as much as possible. A change that migration go 2x as fast but reduces vCPU performance by 10x during the migration is probably not worth it. And the reverse is also true, a change that increases vCPU performance by 10% during migration but makes the migration take 10x longer is also probably not worth it. In the case of this patch, there doesn't seem to be a trade-off. We see an improvement to vCPU performance without any regression in migration duration or other metrics. > > Drop and reacquire mmu_lock with 64-pages may be a little too smaller, > in generic it is one huge page size. However it should be decided by > framework maintainer:) > > Regards > Bibo Mao > > would also reduce contention on mmu_lock, but doing so will increase the > > rate of remote TLB flushing. And there's really no reason to punt this > > problem to userspace since KVM can just drop and reacquire mmu_lock more > > frequently. > > > > Cc: Marc Zyngier <maz@xxxxxxxxxx> > > Cc: Oliver Upton <oliver.upton@xxxxxxxxx> > > Cc: Tianrui Zhao <zhaotianrui@xxxxxxxxxxx> > > Cc: Bibo Mao <maobibo@xxxxxxxxxxx> > > Cc: Huacai Chen <chenhuacai@xxxxxxxxxx> > > Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> > > Cc: Anup Patel <anup@xxxxxxxxxxxxxx> > > Cc: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx> > > Cc: Janosch Frank <frankja@xxxxxxxxxxxxx> > > Cc: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> > > Cc: Sean Christopherson <seanjc@xxxxxxxxxx> > > Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx> > > --- > > v2: > > - Rebase onto kvm/queue [Marc] > > > > v1: https://lore.kernel.org/kvm/20231205181645.482037-1-dmatlack@xxxxxxxxxx/ > > > > virt/kvm/kvm_main.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index fb49c2a60200..0a8b25a52c15 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -2386,7 +2386,6 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm, > > if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n)) > > return -EFAULT; > > > > - KVM_MMU_LOCK(kvm); > > for (offset = log->first_page, i = offset / BITS_PER_LONG, > > n = DIV_ROUND_UP(log->num_pages, BITS_PER_LONG); n--; > > i++, offset += BITS_PER_LONG) { > > @@ -2405,11 +2404,12 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm, > > */ > > if (mask) { > > flush = true; > > + KVM_MMU_LOCK(kvm); > > kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, > > offset, mask); > > + KVM_MMU_UNLOCK(kvm); > > } > > } > > - KVM_MMU_UNLOCK(kvm); > > > > if (flush) > > kvm_flush_remote_tlbs_memslot(kvm, memslot); > > > > base-commit: 9bc60f733839ab6fcdde0d0b15cbb486123e6402 > > >