Flush the shadow MMU instead of iterating over each host VA when doing a large invalidate range callback. The previous code is O(N) in the number of virtual pages being invalidated, while holding both the MMU spinlock and the mmap_sem. Large unmaps can cause significant delay, during which the process is unkillable. Worse, all page allocation could be delayed if there's enough memory pressure that mmu_shrink gets called. Signed-off-by: Eric Northup <digitaleric@xxxxxxxxxx> --- We have seen delays of over 30 seconds doing a large (128GB) unmap. It'd be nicer to check if the amount of work to be done by the entire flush is less than the work to be done iterating over each HVA page, but that information isn't currently available to the arch- independent part of KVM. Better ideas would be most welcome ;-) Tested by attaching a debugger to a running qemu w/kvm and running "call munmap(0, 1UL << 46)". diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7287bf5..9fe303a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -61,6 +61,8 @@ #define CREATE_TRACE_POINTS #include <trace/events/kvm.h> +#define MMU_NOTIFIER_FLUSH_THRESHOLD_PAGES (1024u*1024u*1024u) + MODULE_AUTHOR("Qumranet"); MODULE_LICENSE("GPL"); @@ -332,8 +334,12 @@ static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, * count is also read inside the mmu_lock critical section. */ kvm->mmu_notifier_count++; - for (; start < end; start += PAGE_SIZE) - need_tlb_flush |= kvm_unmap_hva(kvm, start); + if (end - start < MMU_NOTIFIER_FLUSH_THRESHOLD_PAGES) + for (; start < end; start += PAGE_SIZE) + need_tlb_flush |= kvm_unmap_hva(kvm, start); + else + kvm_arch_flush_shadow(kvm); + need_tlb_flush |= kvm->tlbs_dirty; spin_unlock(&kvm->mmu_lock); srcu_read_unlock(&kvm->srcu, idx); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html