Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging

Takuya Yoshikawa <takuya.yoshikawa@xxxxxxxxx> · Wed, 2 May 2012 20:24:14 +0900

During checking mmu_lock contention, I noticed that QEMU's
memory_region_get_dirty() was using unexpectedly much CPU time.

Thanks,
	Takuya

=============================
perf top -t ${QEMU_TID}
=============================
 51.52%  qemu-system-x86_64       [.] memory_region_get_dirty
 16.73%  qemu-system-x86_64       [.] ram_save_remaining
  7.25%  qemu-system-x86_64       [.] cpu_physical_memory_reset_dirty
  3.49%  [kvm]                    [k] __rmap_write_protect
  2.85%  [kvm]                    [k] mmu_spte_update
  2.20%  [kernel]                 [k] copy_user_generic_string
  2.16%  libc-2.13.so             [.] 0x874e9
  1.71%  qemu-system-x86_64       [.] memory_region_set_dirty
  1.20%  qemu-system-x86_64       [.] kvm_physical_sync_dirty_bitmap
  1.00%  [kernel]                 [k] __lock_acquire.isra.31
  0.66%  [kvm]                    [k] rmap_get_next
  0.58%  [kvm]                    [k] rmap_get_first
  0.54%  [kvm]                    [k] kvm_mmu_write_protect_pt_masked
  0.54%  [kvm]                    [k] spte_has_volatile_bits
  0.42%  [kernel]                 [k] lock_release
  0.37%  [kernel]                 [k] tcp_sendmsg
  0.33%  [kernel]                 [k] alloc_pages_current
  0.29%  [kernel]                 [k] native_read_tsc
  0.29%  qemu-system-x86_64       [.] ram_save_block
  0.25%  [kernel]                 [k] lock_is_held
  0.25%  [kernel]                 [k] __ticket_spin_trylock
  0.21%  [kernel]                 [k] lock_acquire

On Sat, 28 Apr 2012 19:05:44 +0900
Takuya Yoshikawa <takuya.yoshikawa@xxxxxxxxx> wrote:

> 1. Problem
>   During live migration, if the guest tries to take mmu_lock at the same
>   time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be
>   forced to wait long time; this is not restricted to page faults caused
>   by GET_DIRTY_LOG's write protection.
> 
> 2. Measurement
> - Server:
>   Xeon: 8 cores(2 CPUs), 24GB memory
> 
> - One VM was being migrated locally to the opposite numa node:
>   Source(active)   VM: binded to node 0
>   Target(incoming) VM: binded to node 1
> 
>   This binding was for reducing extra noise.
> 
> - The guest inside it:
>   3 VCPUs, 11GB memory
> 
> - Workload:
>   On VCPU 2 and 3, there were 3 threads and each of them was endlessly
>   writing to 3GB, in total 9GB, anonymous memory at its maximum speed.
> 
>   I had checked that GET_DIRTY_LOG was forced to write protect more than
>   2 million pages.  So the 9GB memory was almost always kept dirty to be
>   sent.
> 
>   In parallel, on VCPU 1, I checked memory write latency: how long it
>   takes to write to one byte of each page in 1GB anonymous memory.
> 
> - Result:
>   With the current KVM, I could see 1.5ms worst case latency: this
>   corresponds well with the expected mmu_lock hold time.
> 
>   Here, you may think that this is too small compared to the numbers I
>   reported before, using dirty-log-perf, but that was done on 32-bit
>   host on a core-i3 box which was much slower than server machines.
> 
> 
>   Although having 10GB dirty memory pages is a bit extreme for guests
>   with less than 16GB memory, much larger guests, e.g. 128GB guests, may
>   see latency longer than 1.5ms.
> 
> 3. Solution
>   GET_DIRTY_LOG time is very limited compared to other works in QEMU,
>   so we should focus on alleviating the worst case latency first.
> 
>   The solution is very simple and originally suggested by Marcelo:
>     "Conditionally reschedule when there is a contention."
> 
>   By this rescheduling, see the following patch, the worst case latency
>   changed from 1.5ms to 800us for the same test.
> 
> 4. TODO
>   The patch treats kvm_vm_ioctl_get_dirty_log() only, so the write
>   protection by kvm_mmu_slot_remove_write_access(), which is called when
>   we enable dirty page logging, can cause the same problem.
> 
>   My plan is to replace it with rmap-based protection after this.
> 
> 
> Thanks,
> 	Takuya
> 
> ---
> Takuya Yoshikawa (1):
>   KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
> 
>  arch/x86/include/asm/kvm_host.h |    6 +++---
>  arch/x86/kvm/mmu.c              |   12 +++++++++---
>  arch/x86/kvm/x86.c              |   22 +++++++++++++++++-----
>  3 files changed, 29 insertions(+), 11 deletions(-)
> 
> -- 
> 1.7.5.4
> 

-- 
Takuya Yoshikawa <takuya.yoshikawa@xxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html