I don't worry about flushing atime (anyone crazy enough to do that would pay a huge performance penalty). Access is usually checked on open right ... so once a file is open even if the file becomes read-only, the writes, even cached writes continue. On Fri, Mar 14, 2008 at 5:40 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Thu, 13 Mar 2008 21:32:32 -0500 > > > "Steve French" <smfrench@xxxxxxxxx> wrote: > > > On Thu, Mar 13, 2008 at 6:10 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > On Wed, 12 Mar 2008 22:54:50 -0500 > > > "Steve French" <smfrench@xxxxxxxxx> wrote: > > > > > > > kukks noticed that "cp -p source-file /cifs-mnt" would not set the > > > > last write time to the previous time > > > > > > > > It looks like a straightforward caching sideffect - "cp -p" does > > > > > > > > open > > > > write > > > > set time stamps (utimensat) > > > > set uid/gid (fchown32) > > > > set acl (setxattr(system.posix_acl_access)) > > > > > > > > but the write is cached until close (which does a flush just before > > > > the close) - so the final write resets the time which was set > > > > correctly earlier > > > > > > > Interesting -- this seems to be "mtime" week for me. I've been working > > > on some NFS cache invalidation issues that involve out-of-order mtime > > > updates... > > > > > > Actually, this problem looks like a regression caused by the changes > > > in this patch: > > > > > > commit cea218054ad277d6c126890213afde07b4eb1602 > > > Author: Jeff Layton <jlayton@xxxxxxxxxx> > > > Date: Tue Nov 20 23:19:03 2007 +0000 > > > > > > [CIFS] Fix potential data corruption when writing out cached dirty pages > > > > > > > > > ...before that patch we flushed the cache on every setattr call. We now > > > just do that on size changes, but it sounds like we need to reconsider > > > this. > > > > > > Trying to follow Lustre's model sounds tricky and I'm not sure it would > > > work anyway since the server is going to update the mtime whenever the > > > write comes in. Consider: > > > > > > Write --> server sets mtime to current time > > > Setattr --> server sets mtime for utimes() call > > > Write --> server resets mtime to current time > > > > > > ...I'm not sure that the client has much control over this. If we don't > > > want to flush the cache for a setattr, there are two options: > > > > > > 1) delay the setattr until all of the pending writes have been flushed > > > (tricky and not really what we want) > > > > > > 2) queue up another setattr call after the last write has completed > > > (also tricky, and possibly problematic) > > > > > > There could be other problems lurking here too. What happens to the > > > pending writes when we change the ownership and mode to something where > > > we no longer have permission to write to the file? I suppose they just > > > error out and that's not good either. > > > > > > For the record, NFS seems to flush the cache too on all setattr calls. > > > > > > Jeff Layton <jlayton@xxxxxxxxxx> > > > > Here is a patch to fix the cp -p problem: > > > > Kukks/Jeff, > > coments? Unless there are objections I want to add this to the tree > > quickly (I also need to add Q's memleak fix - but want to try it > > against more servers to see if there is a better way to do that) > > > > diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c > > index af42262..e57e5c4 100644 > > --- a/fs/cifs/inode.c > > +++ b/fs/cifs/inode.c > > @@ -1420,11 +1420,10 @@ int cifs_setattr(struct dentry *direntry, > > struct iattr *attrs) > > } > > cifsInode = CIFS_I(direntry->d_inode); > > > > - /* BB check if we need to refresh inode from server now ? BB */ > > - > > - if (attrs->ia_valid & ATTR_SIZE) { > > + if ((attrs->ia_valid & ATTR_MTIME) || (attrs->ia_valid & ATTR_SIZE)) { > > /* > > - Flush data before changing file size on server. If the > > + Flush data before changing file size or changing the last > > + write time of the file on the server. If the > > flush returns error, store it to report later and continue. > > BB: This should be smarter. Why bother flushing pages that > > will be truncated anyway? Also, should we error out here if > > @@ -1435,7 +1434,9 @@ int cifs_setattr(struct dentry *direntry, struct > > iattr *attrs) > > CIFS_I(direntry->d_inode)->write_behind_rc = rc; > > rc = 0; > > } > > + } > > > > + if (attrs->ia_valid & ATTR_SIZE) { > > /* To avoid spurious oplock breaks from server, in the case of > > inodes that we already have open, avoid doing path based > > setting of file size if we can do it by handle. > > > > > > > > That should fix Kukks' problem, but still leaves us vulnerable to the > other problems. For instance: > > Changing UID/GID or mode could cause later writes to fail. > > What happens if we still have outstanding writes, but we've set > ATTR_READONLY on the file on the server? > > What about the atime? Pending writes will clobber it. > > I think the only thing we can really do is flush on every setattr call. > The only situation I can see where we don't need to flush is possibly > where we're *only* changing the ctime, and it's probably not worth it > to make that a special case. > > How about this patch instead? > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c > index af42262..8bcbfff 100644 > > --- a/fs/cifs/inode.c > +++ b/fs/cifs/inode.c > @@ -1420,22 +1420,20 @@ int cifs_setattr(struct dentry *direntry, struct iattr *attrs) > > } > cifsInode = CIFS_I(direntry->d_inode); > > - /* BB check if we need to refresh inode from server now ? BB */ > + /* > + Flush data before changing attributes on server. If the > > + flush returns error, store it to report later and continue. > + BB: This should be smarter. Why bother flushing pages that > + will be truncated anyway? Also, what should we do if the > + the flush returns error? > + */ > + rc = filemap_write_and_wait(direntry->d_inode->i_mapping); > + if (rc != 0) { > + CIFS_I(direntry->d_inode)->write_behind_rc = rc; > > + rc = 0; > + } > > if (attrs->ia_valid & ATTR_SIZE) { > - /* > > - Flush data before changing file size on server. If the > - flush returns error, store it to report later and continue. > > - BB: This should be smarter. Why bother flushing pages that > - will be truncated anyway? Also, should we error out here if > - the flush returns error? > - */ > - rc = filemap_write_and_wait(direntry->d_inode->i_mapping); > - if (rc != 0) { > - CIFS_I(direntry->d_inode)->write_behind_rc = rc; > - rc = 0; > - } > > > - > /* To avoid spurious oplock breaks from server, in the case of > inodes that we already have open, avoid doing path based > setting of file size if we can do it by handle. > -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html