On Mar 12, 2008 23:06 -0500, Steve French wrote: > Since utimes of a cached file could reset the file time before a write > is sent remotely, do NFS and the Linux cluster file systems simply > turn setattr of any timestamp into a fsync of all cached file data? > (this seems excessive when the file is cacheable/oplocked). Is there > are precedent for caching the write of timestamps when writebehind > file data is cached (otherwise the performance penalty could be > horrible)? Lustre handles this at the server side. The only "reasonable" race to handle is if the same client is doing both the write and the utimes call, since if two clients are doing the operation the order is not deterministic anyways. What happens is that each write is tagged with an RPC XID number at send, as is the setattr RPC, and if a setattr RPC with a lower mtime arrives at the server the setattr XID is cached for some period of time and any write RPC that arrives with a lower XID will have the mtime cleared instead of allowing it to move the mtime forward. This avoids the race (which we hit in some cases) where the write was being sent over the network and is processed slightly after the utimes() call, which was actually done later on the client. A common case for this is tar, which calls utimes() after extracting the data. Any later write RPCs will contain the "more uptodate but lower" mtime from the client inode and will not advance the mtime on the server. This also avoids the need to flush all of the data from the client before sending the setattr. Note that Lustre uses the timestamps of the clients and not those of the servers, to avoid the "NFS server setting file times in the future" issues that confuses make and friends. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html