Re: [PATCH 29/35] nfs: in-commit pages accounting and wait queue

Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> · Tue, 14 Dec 2010 10:57:25 -0500

On Tue, 2010-12-14 at 23:40 +0800, Wu Fengguang wrote:
> On Tue, Dec 14, 2010 at 05:15:51AM +0800, Trond Myklebust wrote:
> > On Mon, 2010-12-13 at 22:47 +0800, Wu Fengguang wrote:
> > > plain text document attachment (writeback-nfs-in-commit.patch)
> > > When doing 10+ concurrent dd's, I observed very bumpy commits submission
> > > (partly because the dd's are started at the same time, and hence reached
> > > 4MB to-commit pages at the same time). Basically we rely on the server
> > > to complete and return write/commit requests, and want both to progress
> > > smoothly and not consume too many pages. The write request wait queue is
> > > not enough as it's mainly network bounded. So add another commit request
> > > wait queue. Only async writes need to sleep on this queue.
> > > 
> > 
> > I'm not understanding the above reasoning. Why should we serialise
> > commits at the per-filesystem level (and only for non-blocking flushes
> > at that)?
> 
> I did the commit wait queue after seeing this graph, where there is
> very bursty pattern of commit submission and hence completion:
> 
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2953M-2.6.37-rc3+-2010-12-03-01/nfs-commit-1000.png
> 
> leading to big fluctuations, eg. the almost straight up/straight down
> lines below
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2953M-2.6.37-rc3+-2010-12-03-01/vmstat-dirty-300.png
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2953M-2.6.37-rc3+-2010-12-03-01/dirty-pages.png
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2953M-2.6.37-rc3+-2010-12-03-01/dirty-pages-200.png
> 
> A commit wait queue will help wipe out the "peaks". The "fixed" graph
> is
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-03-23/vmstat-dirty-300.png
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-03-23/dirty-pages.png
> 
> Blocking flushes don't need to wait on this queue because they already
> throttle themselves by waiting on the inode commit lock before/after
> the commit.  They actually should not wait on this queue, to prevent
> sync requests being unnecessarily blocked by async ones.

OK, but isn't it better then to just abort the commit, and have the
relevant async process retry it later?

This is a code path which is followed by kswapd, for instance. It seems
dangerous to be throttling that instead of allowing it to proceed (and
perhaps being able to free up memory on some other partition in the mean
time).

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html