Re: i_version, NFSv4 change attribute

Trond Myklebust <trond.myklebust@xxxxxxxxxx> · Mon, 23 Nov 2009 13:37:44 -0500

On Mon, 2009-11-23 at 13:19 -0500, J. Bruce Fields wrote: 
> On Mon, Nov 23, 2009 at 01:11:19PM -0500, Trond Myklebust wrote:
> > On Mon, 2009-11-23 at 11:44 -0500, J. Bruce Fields wrote: 
> > > If the side we want to optimize is the modifications, I wonder if we
> > > could do all the i_version increments on *read* of i_version?:
> > > 
> > > 	- writes (and other inode modifications) set an "i_version_dirty"
> > > 	  flag.
> > > 	- reads of i_version clear the i_version_dirty flag, increment
> > > 	  i_version, and return the result.
> > > 
> > > As long as the reader sees i_version_flag set only after it sees the
> > > write that caused it, I think it all works?
> > 
> > That probably won't make much of a difference to performance. Most NFSv4
> > clients will have every WRITE followed by a GETATTR operation in the
> > same compound, so your i_version_dirty flag will always immediately get
> > cleared.
> 
> I was only thinking about non-NFS performance.

I would think that running a high performance database _and_ NFS server
on the same machine would tend to be very much of a corner case anyway.
In most setups I'm aware of, the database and NFS server tend to be
completely separate machines.

> > The question is, though, why does the jbd2 machinery need to be engaged
> > on _every_ write?
> 
> Is it?

See Ted's email. As I read it, his concern was that if they allow people
to reduce the a/m/c/time resolution, then the i_version would still
force them to dirty the inode on every write...

Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html