Re: utimes ==> flush

Andreas Dilger <adilger@xxxxxxx> · Wed, 12 Mar 2008 23:33:50 -0600

On Mar 12, 2008  23:06 -0500, Steve French wrote:
> Since utimes of a cached file could reset the file time before a write
> is sent remotely, do NFS and the Linux cluster file systems simply
> turn setattr of any timestamp into a fsync of all cached file data?
> (this seems excessive when the file is cacheable/oplocked).   Is there
> are precedent for caching the write of timestamps when writebehind
> file data is cached (otherwise the performance penalty could be
> horrible)?

Lustre handles this at the server side.  The only "reasonable" race
to handle is if the same client is doing both the write and the utimes
call, since if two clients are doing the operation the order is not
deterministic anyways.

What happens is that each write is tagged with an RPC XID number at
send, as is the setattr RPC, and if a setattr RPC with a lower mtime
arrives at the server the setattr XID is cached for some period of
time and any write RPC that arrives with a lower XID will have the
mtime cleared instead of allowing it to move the mtime forward.

This avoids the race (which we hit in some cases) where the write was
being sent over the network and is processed slightly after the
utimes() call, which was actually done later on the client.  A common
case for this is tar, which calls utimes() after extracting the data.
Any later write RPCs will contain the "more uptodate but lower" mtime
from the client inode and will not advance the mtime on the server.

This also avoids the need to flush all of the data from the client
before sending the setattr.

Note that Lustre uses the timestamps of the clients and not those of
the servers, to avoid the "NFS server setting file times in the future"
issues that confuses make and friends.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html