I am now testing the following patch. Note: this technique is used in several subsystems, e.g. jbd. Although people tend to say that holding mmu_lock during get_dirty is always a problem, my impression is slightly different. When we call get_dirty, most of hot memory pages have already been written at least once and faults are becoming rare. Actually I rarely saw rescheduling due to mmu_lock contention when I tested this patch locally -- though not enough. In contrast, if we do O(1), we need to write protect 511 pages soon after the get_dirty and the chance of mmu_lock contention may increase if multiple VCPUs try to write to memory. Anyway, this patch is small and seems effective. Takuya === From: Takuya Yoshikawa <yoshikawa.takuya@xxxxxxxxxxxxx> get_dirty_log() needs to hold mmu_lock during write protecting dirty pages and this can be long when there are many dirty pages to protect. As the guest can get faulted during that time, this may result in a severe latency problem which would prevent the system to scale. This patch mitigates this by checking mmu_lock contention for every 2K dirty pages we protect: we have selected this value since it took about 100us to get 2K dirty pages. TODO: more numbers. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@xxxxxxxxxxxxx> --- arch/x86/include/asm/kvm_host.h | 6 +++--- arch/x86/kvm/mmu.c | 12 +++++++++--- arch/x86/kvm/x86.c | 18 +++++++++++++----- 3 files changed, 25 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f624ca7..26b39c1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -712,9 +712,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, int kvm_mmu_reset_context(struct kvm_vcpu *vcpu); void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, - struct kvm_memory_slot *slot, - gfn_t gfn_offset, unsigned long mask); +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask); void kvm_mmu_zap_all(struct kvm *kvm); unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 29ad6f9..b88c5cc 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1081,20 +1081,26 @@ static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level * * Used when we do not need to care about huge page mappings: e.g. during dirty * logging we do not have any such mappings. + * + * Returns the number of pages protected by this. */ -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, - struct kvm_memory_slot *slot, - gfn_t gfn_offset, unsigned long mask) +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask) { unsigned long *rmapp; + int nr_protected = 0; while (mask) { rmapp = &slot->rmap[gfn_offset + __ffs(mask)]; __rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL); + ++nr_protected; /* clear the first set bit */ mask &= mask - 1; } + + return nr_protected; } static int rmap_write_protect(struct kvm *kvm, u64 gfn) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0d9a578..b636669 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3092,7 +3092,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) unsigned long n, i; unsigned long *dirty_bitmap; unsigned long *dirty_bitmap_buffer; - bool is_dirty = false; + int nr_protected = 0; mutex_lock(&kvm->slots_lock); @@ -3121,15 +3121,23 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) if (!dirty_bitmap[i]) continue; - is_dirty = true; - mask = xchg(&dirty_bitmap[i], 0); dirty_bitmap_buffer[i] = mask; offset = i * BITS_PER_LONG; - kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask); + nr_protected += kvm_mmu_write_protect_pt_masked(kvm, memslot, + offset, mask); + if (nr_protected > 2048) { + if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { + kvm_flush_remote_tlbs(kvm); + spin_unlock(&kvm->mmu_lock); + cond_resched(); + spin_lock(&kvm->mmu_lock); + } + nr_protected = 0; + } } - if (is_dirty) + if (nr_protected) kvm_flush_remote_tlbs(kvm); spin_unlock(&kvm->mmu_lock); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html