Re: [PATCH 00/18] IO-less dirty throttling v11

Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> · Wed, 07 Sep 2011 15:14:46 -0400

On Wed, 2011-09-07 at 21:32 +0800, Wu Fengguang wrote: 
> > Finally, the complete IO-less balance_dirty_pages(). NFS is observed to perform
> > better or worse depending on the memory size. Otherwise the added patches can
> > address all known regressions.
> 
> I find that the NFS performance regressions on large memory system can
> be fixed by this patch. It tries to make the progress more smooth by
> reasonably reducing the commit size.
> 
> Thanks,
> Fengguang
> ---
> Subject: nfs: limit the commit size to reduce fluctuations
> Date: Thu Dec 16 13:22:43 CST 2010
> 
> Limit the commit size to half the dirty control scope, so that the
> arrival of one commit will not knock the overall dirty pages off the
> scope.
> 
> Also limit the commit size to one second worth of data. This will
> obviously help make the pipeline run more smoothly.
> 
> Also change "<=" to "<": if an inode has only one dirty page in the end,
> it should be committed. I wonder why the "<=" didn't cause a bug...
> 
> CC: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
> Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
> ---
>  fs/nfs/write.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> After patch, there are still drop offs from the control scope,
> 
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/NFS/nfs-1dd-1M-8p-2945M-20%25-2.6.38-rc6-dt6+-2011-02-22-21-09/balance_dirty_pages-pages.png
> 
> due to bursty arrival of commits:
> 
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/NFS/nfs-1dd-1M-8p-2945M-20%25-2.6.38-rc6-dt6+-2011-02-22-21-09/nfs-commit.png
> 
> --- linux-next.orig/fs/nfs/write.c	2011-09-07 21:29:15.000000000 +0800
> +++ linux-next/fs/nfs/write.c	2011-09-07 21:29:32.000000000 +0800
> @@ -1543,10 +1543,14 @@ static int nfs_commit_unstable_pages(str
>  	int ret = 0;
>  
>  	if (wbc->sync_mode == WB_SYNC_NONE) {
> +		unsigned long bw = MIN_WRITEBACK_PAGES +
> +			NFS_SERVER(inode)->backing_dev_info.avg_write_bandwidth;
> +
>  		/* Don't commit yet if this is a non-blocking flush and there
> -		 * are a lot of outstanding writes for this mapping.
> +		 * are a lot of outstanding writes for this mapping, until
> +		 * collected enough pages to commit.
>  		 */
> -		if (nfsi->ncommit <= (nfsi->npages >> 1))
> +		if (nfsi->ncommit < min(nfsi->npages / DIRTY_SCOPE, bw))
>  			goto out_mark_dirty;
>  
>  		/* don't wait for the COMMIT response */

So what goes into the 'avg_write_bandwidth' variable that makes it a
good measure above (why 1 second of data instead of 10 seconds or
1ms, ...)? What is the 'DIRTY_SCOPE' value?

IOW: what new black magic are we introducing above and why is it so
obviously better than what we have (yes, I see you have graphs, but that
is just measuring _one_ NFS setup and workload).

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href