On Thu, 2009-12-24 at 10:52 +0800, Wu Fengguang wrote: > Trond, > > On Thu, Dec 24, 2009 at 03:12:54AM +0800, Trond Myklebust wrote: > > On Wed, 2009-12-23 at 19:05 +0100, Jan Kara wrote: > > > On Wed 23-12-09 15:21:47, Trond Myklebust wrote: > > > > @@ -474,6 +482,18 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > > > > } > > > > > > > > spin_lock(&inode_lock); > > > > + /* > > > > + * Special state for cleaning NFS unstable pages > > > > + */ > > > > + if (inode->i_state & I_UNSTABLE_PAGES) { > > > > + int err; > > > > + inode->i_state &= ~I_UNSTABLE_PAGES; > > > > + spin_unlock(&inode_lock); > > > > + err = commit_unstable_pages(inode, wait); > > > > + if (ret == 0) > > > > + ret = err; > > > > + spin_lock(&inode_lock); > > > > + } > > > I don't quite understand this chunk: We've called writeback_single_inode > > > because it had some dirty pages. Thus it has I_DIRTY_DATASYNC set and a few > > > lines above your chunk, we've called nfs_write_inode which sent commit to > > > the server. Now here you sometimes send the commit again? What's the > > > purpose? > > > > We no longer set I_DIRTY_DATASYNC. We only set I_DIRTY_PAGES (and later > > I_UNSTABLE_PAGES). > > > > The point is that we now do the commit only _after_ we've sent all the > > dirty pages, and waited for writeback to complete, whereas previously we > > did it in the wrong order. > > Sorry I still don't get it. The timing used to be: > > write 4MB ==> WRITE block 0 (ie. first 512KB) > WRITE block 1 > WRITE block 2 > WRITE block 3 ack from server for WRITE block 0 => mark 0 as unstable (inode marked need-commit) > WRITE block 4 ack from server for WRITE block 1 => mark 1 as unstable > WRITE block 5 ack from server for WRITE block 2 => mark 2 as unstable > WRITE block 6 ack from server for WRITE block 3 => mark 3 as unstable > WRITE block 7 ack from server for WRITE block 4 => mark 4 as unstable > ack from server for WRITE block 5 => mark 5 as unstable > write_inode ==> COMMIT blocks 0-5 > ack from server for WRITE block 6 => mark 6 as unstable (inode marked need-commit) > ack from server for WRITE block 7 => mark 7 as unstable > > ack from server for COMMIT blocks 0-5 => mark 0-5 as clean > > write_inode ==> COMMIT blocks 6-7 > > ack from server for COMMIT blocks 6-7 => mark 6-7 as clean > > Note that the first COMMIT is submitted before receiving all ACKs for > the previous writes, hence the second COMMIT is necessary. It seems > that your patch does not improve the timing at all. That would indicate that we're cycling through writeback_single_inode() more than once. Why? Trond -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html