On Mon, Nov 23, 2009 at 01:11:19PM -0500, Trond Myklebust wrote: > On Mon, 2009-11-23 at 11:44 -0500, J. Bruce Fields wrote: > > If the side we want to optimize is the modifications, I wonder if we > > could do all the i_version increments on *read* of i_version?: > > > > - writes (and other inode modifications) set an "i_version_dirty" > > flag. > > - reads of i_version clear the i_version_dirty flag, increment > > i_version, and return the result. > > > > As long as the reader sees i_version_flag set only after it sees the > > write that caused it, I think it all works? > > That probably won't make much of a difference to performance. Most NFSv4 > clients will have every WRITE followed by a GETATTR operation in the > same compound, so your i_version_dirty flag will always immediately get > cleared. I was only thinking about non-NFS performance. > The question is, though, why does the jbd2 machinery need to be engaged > on _every_ write? Is it? I thought I remembered a journaling issue from previous discussions, but Ted seemed concerned just about the overhead of an additional spinlock, and looking at the code, the only test of I_VERSION that I can see indeed is in ext4_mark_iloc_dirty(), and indeed just takes a spinlock and updates the i_version. --b. > The NFS clients don't care if we lose an i_version count due to a > sudden server reboot, since that will trigger a rewrite of the dirty > data anyway once the server comes back up again. As long as the > i_version is guaranteed to be written to stable storage on a > successful call to fsync(), then the NFS data integrity requirements > are fully satisfied. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html