On Fri, 2009-05-29 at 13:51 -0400, Peter Staubach wrote: > Trond Myklebust wrote: > > On Fri, 2009-05-29 at 13:38 -0400, Brian R Cowan wrote: > > > >>> You may have a misunderstanding about what exactly "async" does. The > >>> "sync" / "async" mount options control only whether the application > >>> waits for the data to be flushed to permanent storage. They have no > >>> effect on any file system I know of _how_ specifically the data is > >>> moved from the page cache to permanent storage. > >>> > >> The problem is that the client change seems to cause the application to > >> stop until this stable write completes... What is interesting is that it's > >> not always a write operation that the linker gets stuck on. Our best > >> hypothesis -- from correlating times in strace and tcpdump traces -- is > >> that the FILE_SYNC'ed write NFS RPCs are in fact triggered by *read()* > >> system calls on the output file (that is opened for read/write). We THINK > >> the read call triggers a FILE_SYNC write if the page is dirty...and that > >> is why the read calls are taking so long. Seeing writes happening when the > >> app is waiting for a read is odd to say the least... (In my test, there is > >> nothing else running on the Virtual machines, so the only thing that could > >> be triggering the filesystem activity is the build test...) > >> > > > > Yes. If the page is dirty, but not up to date, then it needs to be > > cleaned before you can overwrite the contents with the results of a > > fresh read. > > That means flushing the data to disk... Which again means doing either a > > stable write or an unstable write+commit. The former is more efficient > > that the latter, 'cos it accomplishes the exact same work in a single > > RPC call. > > In the normal case, we aren't overwriting the contents with the > results of a fresh read. We are going to simply return the > current contents of the page. Given this, then why is the normal > data cache consistency mechanism, based on the attribute cache, > not sufficient? It is. You would need to look into why the page was not marked with the PG_uptodate flag when it was being filled. We generally do try to do that whenever possible. Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html