On Wed, 2011-09-07 at 21:32 +0800, Wu Fengguang wrote: > > Finally, the complete IO-less balance_dirty_pages(). NFS is observed to perform > > better or worse depending on the memory size. Otherwise the added patches can > > address all known regressions. > > I find that the NFS performance regressions on large memory system can > be fixed by this patch. It tries to make the progress more smooth by > reasonably reducing the commit size. > > Thanks, > Fengguang > --- > Subject: nfs: limit the commit size to reduce fluctuations > Date: Thu Dec 16 13:22:43 CST 2010 > > Limit the commit size to half the dirty control scope, so that the > arrival of one commit will not knock the overall dirty pages off the > scope. > > Also limit the commit size to one second worth of data. This will > obviously help make the pipeline run more smoothly. > > Also change "<=" to "<": if an inode has only one dirty page in the end, > it should be committed. I wonder why the "<=" didn't cause a bug... > > CC: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> > Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> > --- > fs/nfs/write.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > After patch, there are still drop offs from the control scope, > > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/NFS/nfs-1dd-1M-8p-2945M-20%25-2.6.38-rc6-dt6+-2011-02-22-21-09/balance_dirty_pages-pages.png > > due to bursty arrival of commits: > > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/NFS/nfs-1dd-1M-8p-2945M-20%25-2.6.38-rc6-dt6+-2011-02-22-21-09/nfs-commit.png > > --- linux-next.orig/fs/nfs/write.c 2011-09-07 21:29:15.000000000 +0800 > +++ linux-next/fs/nfs/write.c 2011-09-07 21:29:32.000000000 +0800 > @@ -1543,10 +1543,14 @@ static int nfs_commit_unstable_pages(str > int ret = 0; > > if (wbc->sync_mode == WB_SYNC_NONE) { > + unsigned long bw = MIN_WRITEBACK_PAGES + > + NFS_SERVER(inode)->backing_dev_info.avg_write_bandwidth; > + > /* Don't commit yet if this is a non-blocking flush and there > - * are a lot of outstanding writes for this mapping. > + * are a lot of outstanding writes for this mapping, until > + * collected enough pages to commit. > */ > - if (nfsi->ncommit <= (nfsi->npages >> 1)) > + if (nfsi->ncommit < min(nfsi->npages / DIRTY_SCOPE, bw)) > goto out_mark_dirty; > > /* don't wait for the COMMIT response */ So what goes into the 'avg_write_bandwidth' variable that makes it a good measure above (why 1 second of data instead of 10 seconds or 1ms, ...)? What is the 'DIRTY_SCOPE' value? IOW: what new black magic are we introducing above and why is it so obviously better than what we have (yes, I see you have graphs, but that is just measuring _one_ NFS setup and workload). -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html