On Wed 23-12-09 15:21:47, Trond Myklebust wrote: > On Tue, 2009-12-22 at 09:59 +0800, Wu Fengguang wrote: > > 1) normal fs sets I_DIRTY_DATASYNC when extending i_size, however NFS > > will set the flag for any pages written -- why this trick? To > > guarantee the call of nfs_commit_inode()? Which unfortunately turns > > almost every server side NFS write into sync writes.. > > > > writeback_single_inode: > > do_writepages > > nfs_writepages > > nfs_writepage ----[short time later]---> nfs_writeback_release* > > nfs_mark_request_commit > > __mark_inode_dirty(I_DIRTY_DATASYNC); > > > > if (I_DIRTY_SYNC || I_DIRTY_DATASYNC) <---- so this will be true for most time > > write_inode > > nfs_write_inode > > nfs_commit_inode > > > I have been working on a fix for this. We basically do want to ensure > that NFS calls commit (otherwise we're not finished cleaning the dirty > pages), but we want to do it _after_ we've waited for all the writes to > complete. See below... > > Trond > > ------------------------------------------------------------------------------------------------------ > VFS: Add a new inode state: I_UNSTABLE_PAGES > > From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> > > Add a new inode state to enable the vfs to commit the nfs unstable pages to > stable storage once the write back of dirty pages is done. Hmm, does your patch really help? > @@ -474,6 +482,18 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > } > > spin_lock(&inode_lock); > + /* > + * Special state for cleaning NFS unstable pages > + */ > + if (inode->i_state & I_UNSTABLE_PAGES) { > + int err; > + inode->i_state &= ~I_UNSTABLE_PAGES; > + spin_unlock(&inode_lock); > + err = commit_unstable_pages(inode, wait); > + if (ret == 0) > + ret = err; > + spin_lock(&inode_lock); > + } I don't quite understand this chunk: We've called writeback_single_inode because it had some dirty pages. Thus it has I_DIRTY_DATASYNC set and a few lines above your chunk, we've called nfs_write_inode which sent commit to the server. Now here you sometimes send the commit again? What's the purpose? > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c > index faa0918..4f129b3 100644 > --- a/fs/nfs/inode.c > +++ b/fs/nfs/inode.c > @@ -99,17 +99,14 @@ u64 nfs_compat_user_ino64(u64 fileid) > > int nfs_write_inode(struct inode *inode, int sync) > { > + int flags = 0; > int ret; > > - if (sync) { > - ret = filemap_fdatawait(inode->i_mapping); > - if (ret == 0) > - ret = nfs_commit_inode(inode, FLUSH_SYNC); > - } else > - ret = nfs_commit_inode(inode, 0); > - if (ret >= 0) > + if (sync) > + flags = FLUSH_SYNC; > + ret = nfs_commit_inode(inode, flags); > + if (ret > 0) > return 0; > - __mark_inode_dirty(inode, I_DIRTY_DATASYNC); > return ret; > } > Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html