Re: nfs-backed mmap file results in 1000s of WRITEs per second

"Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> · Mon, 9 Sep 2013 17:47:48 +0000

On Mon, 2013-09-09 at 12:32 -0500, Quentin Barnes wrote:
> On Mon, Sep 09, 2013 at 09:04:24AM -0400, Jeff Layton wrote:
> > On Fri, 6 Sep 2013 11:48:45 -0500
> > Quentin Barnes <qbarnes@xxxxxxxxx> wrote:
> > 
> > > Jeff, can your try out my test program in the base note on your
> > > RHEL5.9 or later RHEL5.x kernels?
> > > 
> > > I reverified that running the test on a 2.6.18-348.16.1.el5 x86_64
> > > kernel (latest released RHEL5.9) does not show the problem for me.
> > > Based on what you and Trond have said in this thread though, I'm
> > > really curious why it doesn't have the problem.
> > 
> > I can confirm what you see on RHEL5. One difference is that RHEL5's
> > page_mkwrite handler does not do wait_on_page_writeback. That was added
> > as part of the stable pages work that went in a while back, so that may 
> > be the main difference. Adding that in doesn't seem to materially
> > change things though.
> 
> Good to know you confirmed the behavior I saw on RHEL5 (just so that
> I know it's not some random variable in play I had overlooked).
> 
> > In any case, what I see is that the initial program just ends up with a
> > two calls to nfs_vm_page_mkwrite(). They both push out a WRITE and then
> > things settle down (likely because the page is still marked dirty).
> > 
> > Eventually, another write occurs and the dirty page gets pushed out to
> > the server in a small flurry of WRITEs to the same range.Then, things
> > settle down again until there's another small flurry of activity.
> > 
> > My suspicion is that there is a race condition involved here, but I'm
> > unclear on where it is. I'm not 100% convinced this is a bug, but page
> > fault semantics aren't my strong suit.
> 
> As a test on RHEL6, I made a trivial systemtap script for kprobing
> nfs_vm_page_mkwrite() and nfs_flush_incompatible().  I wanted to
> make sure this bug was limited to just the nfs module and was not a
> result of some mm behavior change.
> 
> With the bug unfixed running the test program, nfs_vm_page_mkwrite()
> and nfs_flush_incompatible() are called repeatedly at a very high rate
> (hence all the WRITEs).
> 
> After Trond's patch, the two functions are called just at the
> program's initialization and then called only every 30 seconds or
> so.
> 
> It looks like to me from the code flow that there must be something
> nfs_wb_page() does that resets the need for mm to keeping reinvoking
> nfs_vm_page_mkwrite().  I didn't look any deeper than that though
> for now.  Maybe a race in how nfs_wb_page() updates status you're
> thinking of?

In RHEL-5, nfs_wb_page() is just a wrapper to nfs_sync_inode_wait(),
which does _not_ call clear_page_dirty_for_io() (and hence does not call
page_mkclean()).

That would explain it...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥