Re: [PATCH 3/7] writeback: introduce smoothed global dirty limit

Jan Kara <jack@xxxxxxx> · Mon, 20 Jun 2011 23:18:10 +0200

On Sun 19-06-11 23:01:11, Wu Fengguang wrote:
> The start of a heavy weight application (ie. KVM) may instantly knock
> down determine_dirtyable_memory() and hence the global/bdi dirty
> thresholds.
> 
> So introduce global_dirty_limit for tracking the global dirty threshold
> with policies
> 
> - follow downwards slowly
> - follow up in one shot
> 
> global_dirty_limit can effectively mask out the impact of sudden drop of
> dirtyable memory. It will be used in the next patch for two new type of
> dirty limits.
Looking into the code, this patch is dependent on previously submitted
patches for estimation of BDI write bandwidth, isn't it?

> 
> Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
> ---
>  include/linux/writeback.h |    2 +
>  mm/page-writeback.c       |   41 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
> 
> --- linux-next.orig/include/linux/writeback.h	2011-06-19 22:56:18.000000000 +0800
> +++ linux-next/include/linux/writeback.h	2011-06-19 22:59:29.000000000 +0800
> @@ -88,6 +88,8 @@ static inline void laptop_sync_completio
>  #endif
>  void throttle_vm_writeout(gfp_t gfp_mask);
>  
> +extern unsigned long global_dirty_limit;
> +
>  /* These are exported to sysctl. */
>  extern int dirty_background_ratio;
>  extern unsigned long dirty_background_bytes;
> --- linux-next.orig/mm/page-writeback.c	2011-06-19 22:56:18.000000000 +0800
> +++ linux-next/mm/page-writeback.c	2011-06-19 22:59:29.000000000 +0800
> @@ -116,6 +116,7 @@ EXPORT_SYMBOL(laptop_mode);
>  
>  /* End of sysctl-exported parameters */
>  
> +unsigned long global_dirty_limit;
>  
>  /*
>   * Scale the writeback cache size proportional to the relative writeout speeds.
> @@ -510,6 +511,43 @@ static void bdi_update_write_bandwidth(s
>  	bdi->avg_write_bandwidth = avg;
>  }
>  
> +static void update_dirty_limit(unsigned long thresh,
> +				 unsigned long dirty)
> +{
> +	unsigned long limit = global_dirty_limit;
> +
> +	if (limit < thresh) {
> +		limit = thresh;
> +		goto update;
> +	}
> +
> +	if (limit > thresh &&
> +	    limit > dirty) {
> +		limit -= (limit - max(thresh, dirty)) >> 5;
> +		goto update;
> +	}
Hmm, but strictly speaking this never really converges to the new limit,
right? And even in practice it takes 22 steps to converge within 50% and 73
steps to converge withing 10% of the desired threshold. But before we get
into the discussion about how to update dirty threshold, I'd like to
discuss what are the properties we require from dirty threshold updates?

I understand it's kind of unexpected that when some process allocates
anonymous memory, we suddently stall writers to flush out dirty data. OTOH
it is a nice behavior from memory management point of view because we
really try to keep amount of dirty pages in free memory at given
percentage which is nice for reclaim. When we choose to update dirty limit
in steps as you propose, we get some kind of compromise between behavior
for user and behavior for memory management.

But there are also other options - for example it would seem natural to me
treat allocation of anonymous page the same way as dirtiying the page thus
the process getting us over dirty threshold due to allocations will wait
until enough pages are written. Then we wouldn't need any smoothing of
memory available for caches because allocations would be naturally
throttled (OK, we still have memory taken by kernel allocations but these
are order of magnitude smaller problem I'd say). But I'm not sure how
acceptable this idea would be.

								Honza

> +	return;
> +update:
> +	global_dirty_limit = limit;
> +}
> +
> +static void global_update_bandwidth(unsigned long thresh,
> +				    unsigned long dirty,
> +				    unsigned long now)
> +{
> +	static DEFINE_SPINLOCK(dirty_lock);
> +
> +	if (now - default_backing_dev_info.bw_time_stamp < MAX_PAUSE)
> +		return;
> +
> +	spin_lock(&dirty_lock);
> +	if (now - default_backing_dev_info.bw_time_stamp >= MAX_PAUSE) {
> +		update_dirty_limit(thresh, dirty);
> +		default_backing_dev_info.bw_time_stamp = now;
> +	}
> +	spin_unlock(&dirty_lock);
> +}
> +
>  void __bdi_update_bandwidth(struct backing_dev_info *bdi,
>  			    unsigned long thresh,
>  			    unsigned long dirty,
> @@ -535,6 +573,9 @@ void __bdi_update_bandwidth(struct backi
>  	if (elapsed > HZ && time_before(bdi->bw_time_stamp, start_time))
>  		goto snapshot;
>  
> +	if (thresh)
> +		global_update_bandwidth(thresh, dirty, now);
> +
>  	bdi_update_write_bandwidth(bdi, elapsed, written);
>  
>  snapshot:
> 
> 
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html