Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing

Trond Myklebust <trond.myklebust@xxxxxxxxxx> · Fri, 29 May 2009 14:43:35 -0400

On Fri, 2009-05-29 at 13:51 -0400, Peter Staubach wrote:
> Trond Myklebust wrote:
> > On Fri, 2009-05-29 at 13:38 -0400, Brian R Cowan wrote:
> >   
> >>> You may have a misunderstanding about what exactly "async" does.  The 
> >>> "sync" / "async" mount options control only whether the application 
> >>> waits for the data to be flushed to permanent storage.  They have no 
> >>> effect on any file system I know of _how_ specifically the data is 
> >>> moved from the page cache to permanent storage.
> >>>       
> >> The problem is that the client change seems to cause the application to 
> >> stop until this stable write completes... What is interesting is that it's 
> >> not always a write operation that the linker gets stuck on. Our best 
> >> hypothesis -- from correlating times in strace and tcpdump traces -- is 
> >> that the FILE_SYNC'ed write NFS RPCs are in fact triggered by *read()* 
> >> system calls on the output file (that is opened for read/write). We THINK 
> >> the read call triggers a FILE_SYNC write if the page is dirty...and that 
> >> is why the read calls are taking so long. Seeing writes happening when the 
> >> app is waiting for a read is odd to say the least... (In my test, there is 
> >> nothing else running on the Virtual machines, so the only thing that could 
> >> be triggering the filesystem activity is the build test...)
> >>     
> >
> > Yes. If the page is dirty, but not up to date, then it needs to be
> > cleaned before you can overwrite the contents with the results of a
> > fresh read.
> > That means flushing the data to disk... Which again means doing either a
> > stable write or an unstable write+commit. The former is more efficient
> > that the latter, 'cos it accomplishes the exact same work in a single
> > RPC call.
> 
> In the normal case, we aren't overwriting the contents with the
> results of a fresh read.  We are going to simply return the
> current contents of the page.  Given this, then why is the normal
> data cache consistency mechanism, based on the attribute cache,
> not sufficient?

It is. You would need to look into why the page was not marked with the
PG_uptodate flag when it was being filled. We generally do try to do
that whenever possible.

Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html