On 14/11/17 07:52, Jan Glauber wrote: > On Mon, Nov 13, 2017 at 06:11:19PM +0000, Marc Zyngier wrote: >> On 13/11/17 17:35, Jan Glauber wrote: > > [...] > >>>>> numbers don't look good, see waittime-max: >>>>> >>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg >>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> &(&kvm->mmu_lock)->rlock: 99346764 99406604 0.14 1321260806.59 710654434972.0 7148.97 154228320 225122857 0.13 917688890.60 3705916481.39 16.46 >>>>> ------------------------ >>>>> &(&kvm->mmu_lock)->rlock 99365598 [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950 >>>>> &(&kvm->mmu_lock)->rlock 25164 [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8 >>>>> &(&kvm->mmu_lock)->rlock 14934 [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68 >>>>> &(&kvm->mmu_lock)->rlock 908 [<ffff00000810a1f0>] __cond_resched_lock+0x68/0xb8 >>>>> ------------------------ >>>>> &(&kvm->mmu_lock)->rlock 3 [<ffff0000080b34c8>] stage2_flush_vm+0x60/0xd8 >>>>> &(&kvm->mmu_lock)->rlock 99186296 [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950 >>>>> &(&kvm->mmu_lock)->rlock 179238 [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8 >>>>> &(&kvm->mmu_lock)->rlock 19181 [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68 >>>>> >>>>> ............................................................................................................................................................................................................................. >>>> [slots of stuff] >>>> >>>> Well, the mmu_lock is clearly contended. Is the box in a state where you >>>> are swapping? There seem to be as many faults as contentions, which is a >>>> bit surprising... >>> >>> I don't think it is swapping but need to double check. >> >> It is the number of aborts that is staggering. And each one of them >> leads to the mmu_lock being contended. So something seems to be taking >> its sweet time holding the damned lock. > > Can you elaborate on the aborts, I'm not familiar with KVM but from a > first look I thought kvm_handle_guest_abort() is in the normal path > when a vcpu is stopped. Is that wrong? kvm_handle_guest_abort() is the entry point for our page fault handling (hence the mmu_lock being taken). On its own, the number of faults is irrelevant. What worries me is that in almost all the cases the lock was contended, we were handling a page fault. What would be interesting is to find out *who* is holding the lock when we're being blocked in kvm_handle_guest_abort... M. -- Jazz is not dead. It just smells funny...