During checking mmu_lock contention, I noticed that QEMU's memory_region_get_dirty() was using unexpectedly much CPU time. Thanks, Takuya ============================= perf top -t ${QEMU_TID} ============================= 51.52% qemu-system-x86_64 [.] memory_region_get_dirty 16.73% qemu-system-x86_64 [.] ram_save_remaining 7.25% qemu-system-x86_64 [.] cpu_physical_memory_reset_dirty 3.49% [kvm] [k] __rmap_write_protect 2.85% [kvm] [k] mmu_spte_update 2.20% [kernel] [k] copy_user_generic_string 2.16% libc-2.13.so [.] 0x874e9 1.71% qemu-system-x86_64 [.] memory_region_set_dirty 1.20% qemu-system-x86_64 [.] kvm_physical_sync_dirty_bitmap 1.00% [kernel] [k] __lock_acquire.isra.31 0.66% [kvm] [k] rmap_get_next 0.58% [kvm] [k] rmap_get_first 0.54% [kvm] [k] kvm_mmu_write_protect_pt_masked 0.54% [kvm] [k] spte_has_volatile_bits 0.42% [kernel] [k] lock_release 0.37% [kernel] [k] tcp_sendmsg 0.33% [kernel] [k] alloc_pages_current 0.29% [kernel] [k] native_read_tsc 0.29% qemu-system-x86_64 [.] ram_save_block 0.25% [kernel] [k] lock_is_held 0.25% [kernel] [k] __ticket_spin_trylock 0.21% [kernel] [k] lock_acquire On Sat, 28 Apr 2012 19:05:44 +0900 Takuya Yoshikawa <takuya.yoshikawa@xxxxxxxxx> wrote: > 1. Problem > During live migration, if the guest tries to take mmu_lock at the same > time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be > forced to wait long time; this is not restricted to page faults caused > by GET_DIRTY_LOG's write protection. > > 2. Measurement > - Server: > Xeon: 8 cores(2 CPUs), 24GB memory > > - One VM was being migrated locally to the opposite numa node: > Source(active) VM: binded to node 0 > Target(incoming) VM: binded to node 1 > > This binding was for reducing extra noise. > > - The guest inside it: > 3 VCPUs, 11GB memory > > - Workload: > On VCPU 2 and 3, there were 3 threads and each of them was endlessly > writing to 3GB, in total 9GB, anonymous memory at its maximum speed. > > I had checked that GET_DIRTY_LOG was forced to write protect more than > 2 million pages. So the 9GB memory was almost always kept dirty to be > sent. > > In parallel, on VCPU 1, I checked memory write latency: how long it > takes to write to one byte of each page in 1GB anonymous memory. > > - Result: > With the current KVM, I could see 1.5ms worst case latency: this > corresponds well with the expected mmu_lock hold time. > > Here, you may think that this is too small compared to the numbers I > reported before, using dirty-log-perf, but that was done on 32-bit > host on a core-i3 box which was much slower than server machines. > > > Although having 10GB dirty memory pages is a bit extreme for guests > with less than 16GB memory, much larger guests, e.g. 128GB guests, may > see latency longer than 1.5ms. > > 3. Solution > GET_DIRTY_LOG time is very limited compared to other works in QEMU, > so we should focus on alleviating the worst case latency first. > > The solution is very simple and originally suggested by Marcelo: > "Conditionally reschedule when there is a contention." > > By this rescheduling, see the following patch, the worst case latency > changed from 1.5ms to 800us for the same test. > > 4. TODO > The patch treats kvm_vm_ioctl_get_dirty_log() only, so the write > protection by kvm_mmu_slot_remove_write_access(), which is called when > we enable dirty page logging, can cause the same problem. > > My plan is to replace it with rmap-based protection after this. > > > Thanks, > Takuya > > --- > Takuya Yoshikawa (1): > KVM: Reduce mmu_lock contention during dirty logging by cond_resched() > > arch/x86/include/asm/kvm_host.h | 6 +++--- > arch/x86/kvm/mmu.c | 12 +++++++++--- > arch/x86/kvm/x86.c | 22 +++++++++++++++++----- > 3 files changed, 29 insertions(+), 11 deletions(-) > > -- > 1.7.5.4 > -- Takuya Yoshikawa <takuya.yoshikawa@xxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html