On Fri, 2010-01-08 at 09:17 +0800, Wu Fengguang wrote: > On Thu, Jan 07, 2010 at 11:10:22PM +0800, Trond Myklebust wrote: > > On Thu, 2010-01-07 at 22:56 +0800, Wu Fengguang wrote: > > > On Thu, Jan 07, 2010 at 12:38:02PM +0800, Myklebust, Trond wrote: > > > > > > > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c > > > > > > index d171696..910be28 100644 > > > > > > --- a/fs/nfs/write.c > > > > > > +++ b/fs/nfs/write.c > > > > > > @@ -441,7 +441,7 @@ nfs_mark_request_commit(struct nfs_page *req) > > > > > > spin_unlock(&inode->i_lock); > > > > > > inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS); > > > > > > inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_RECLAIMABLE); > > > > > > - __mark_inode_dirty(inode, I_DIRTY_DATASYNC); > > > > > > + mark_inode_unstable_pages(inode); > > > > > > > > > > Then we shall mark I_DIRTY_DATASYNC on other places that extend i_size. > > > > > > > > Why? The NFS client itself shouldn't ever set I_DIRTY_DATASYNC after > > > > this patch is applied. We won't ever need it. > > > > > > > > If the VM or VFS is doing it, then they ought to be fixed: there is no > > > > reason to assume that all filesystems need to sync their inodes on > > > > i_size changes. > > > > > > Sorry, one more question. > > > > > > It seems to me that you are replacing > > > > > > I_DIRTY_DATASYNC => write_inode() > > > with > > > I_UNSTABLE_PAGES => commit_unstable_pages() > > > > > > Is that change for the sake of clarity? Or to fix some problem? > > > (This patch does fix some problems, but do they inherently require > > > the above change?) > > > > As I said previously, the write_inode() call is done _before_ you sync > > the dirty pages to the server, whereas commit_unstable_pages() wants to > > be done _after_ syncing. So the two are not the same, and we cannot > > replace commit_unstable_pages() with write_inode(). > > This is the ordering: > > 0 do_writepages() > 1 if (I_DIRTY_SYNC | I_DIRTY_DATASYNC) > 2 write_inode() > 3 if (wait) > 4 filemap_fdatawait() > 5 if (I_UNSTABLE_PAGES) > 6 commit_unstable_pages() > > The page is synced to NFS server in line 0. > > The only difference is write_inode() is called before filemap_fdatawait(), > while commit_unstable_pages() is called after it. > > Note that filemap_fdatawait() will only be called on WB_SYNC_ALL, so I > still cannot understand the difference.. The difference is precisely that... In the case of WB_SYNC_ALL we want the call to filemap_fdatawait() to occur before we call commit_unstable_pages(), so that we know that all the in-flight write rpc calls are done before we ask that they be committed to stable storage. In the case of WB_SYNC_NONE, there is no wait, and so we are forced to play games with heuristics and/or add the force_commit_unstable flag because we don't wait for the dirty pages to be cleaned. I don't like this, but those are the semantics that we've defined for WB_SYNC_NONE. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html