Re: [PATCH being tested] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Thu, 12 Apr 2012 19:56:45 -0300

On Wed, Apr 11, 2012 at 08:22:07PM +0900, Takuya Yoshikawa wrote:
> I am now testing the following patch.
> 
> Note: this technique is used in several subsystems, e.g. jbd.
> 
> Although people tend to say that holding mmu_lock during get_dirty is
> always a problem, my impression is slightly different.

Other than potential performance improvement, the worst case scenario
of holding mmu_lock for hundreds of milliseconds at the beginning 
of migration of huge guests must be fixed.

> When we call get_dirty, most of hot memory pages have already been
> written at least once and faults are becoming rare.
> 
> Actually I rarely saw rescheduling due to mmu_lock contention when
> I tested this patch locally -- though not enough.
> 
> In contrast, if we do O(1), we need to write protect 511 pages soon
> after the get_dirty and the chance of mmu_lock contention may increase
> if multiple VCPUs try to write to memory.
> 
> Anyway, this patch is small and seems effective.
> 
> 	Takuya
> 
> ===
> From: Takuya Yoshikawa <yoshikawa.takuya@xxxxxxxxxxxxx>
> 
> get_dirty_log() needs to hold mmu_lock during write protecting dirty
> pages and this can be long when there are many dirty pages to protect.
> 
> As the guest can get faulted during that time, this may result in a
> severe latency problem which would prevent the system to scale.
> 
> This patch mitigates this by checking mmu_lock contention for every 2K
> dirty pages we protect: we have selected this value since it took about
> 100us to get 2K dirty pages.
> 
> TODO: more numbers.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@xxxxxxxxxxxxx>
> ---
>  arch/x86/include/asm/kvm_host.h |    6 +++---
>  arch/x86/kvm/mmu.c              |   12 +++++++++---
>  arch/x86/kvm/x86.c              |   18 +++++++++++++-----
>  3 files changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f624ca7..26b39c1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -712,9 +712,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>  
>  int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
>  void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> -				     struct kvm_memory_slot *slot,
> -				     gfn_t gfn_offset, unsigned long mask);
> +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +				    struct kvm_memory_slot *slot,
> +				    gfn_t gfn_offset, unsigned long mask);
>  void kvm_mmu_zap_all(struct kvm *kvm);
>  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 29ad6f9..b88c5cc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1081,20 +1081,26 @@ static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level
>   *
>   * Used when we do not need to care about huge page mappings: e.g. during dirty
>   * logging we do not have any such mappings.
> + *
> + * Returns the number of pages protected by this.
>   */
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> -				     struct kvm_memory_slot *slot,
> -				     gfn_t gfn_offset, unsigned long mask)
> +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +				    struct kvm_memory_slot *slot,
> +				    gfn_t gfn_offset, unsigned long mask)
>  {
>  	unsigned long *rmapp;
> +	int nr_protected = 0;
>  
>  	while (mask) {
>  		rmapp = &slot->rmap[gfn_offset + __ffs(mask)];
>  		__rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL);
> +		++nr_protected;
>  
>  		/* clear the first set bit */
>  		mask &= mask - 1;
>  	}
> +
> +	return nr_protected;
>  }
>  
>  static int rmap_write_protect(struct kvm *kvm, u64 gfn)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0d9a578..b636669 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3092,7 +3092,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  	unsigned long n, i;
>  	unsigned long *dirty_bitmap;
>  	unsigned long *dirty_bitmap_buffer;
> -	bool is_dirty = false;
> +	int nr_protected = 0;
>  
>  	mutex_lock(&kvm->slots_lock);
>  
> @@ -3121,15 +3121,23 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  		if (!dirty_bitmap[i])
>  			continue;
>  
> -		is_dirty = true;
> -
>  		mask = xchg(&dirty_bitmap[i], 0);
>  		dirty_bitmap_buffer[i] = mask;
>  
>  		offset = i * BITS_PER_LONG;
> -		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> +		nr_protected += kvm_mmu_write_protect_pt_masked(kvm, memslot,
> +								offset, mask);
> +		if (nr_protected > 2048) {

Can you expand on the reasoning behind this?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html