Re: set last write time = fsync ?

"Steve French" <smfrench@xxxxxxxxx> · Fri, 14 Mar 2008 11:16:41 -0500

I don't worry about flushing atime (anyone crazy enough to do that
would pay a huge performance penalty).
Access is usually checked on open right ... so once a file is open
even if the file becomes read-only, the writes, even cached writes
continue.

On Fri, Mar 14, 2008 at 5:40 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> On Thu, 13 Mar 2008 21:32:32 -0500
>
>
> "Steve French" <smfrench@xxxxxxxxx> wrote:
>
>  > On Thu, Mar 13, 2008 at 6:10 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>  > >
>  > > On Wed, 12 Mar 2008 22:54:50 -0500
>  > >  "Steve French" <smfrench@xxxxxxxxx> wrote:
>  > >
>  > >  > kukks noticed that "cp -p source-file /cifs-mnt" would not set the
>  > >  > last write time to the previous time
>  > >  >
>  > >  > It looks like a straightforward caching sideffect - "cp -p" does
>  > >  >
>  > >  > open
>  > >  > write
>  > >  > set time stamps (utimensat)
>  > >  > set uid/gid (fchown32)
>  > >  > set acl (setxattr(system.posix_acl_access))
>  > >  >
>  > >  > but the write is cached until close (which does a flush just before
>  > >  > the close) - so the final write resets the time which was set
>  > >  > correctly earlier
>  > >  >
>  > >  Interesting -- this seems to be "mtime" week for me. I've been working
>  > >  on some NFS cache invalidation issues that involve out-of-order mtime
>  > >  updates...
>  > >
>  > >  Actually, this problem looks like a regression caused by the changes
>  > >  in this patch:
>  > >
>  > >  commit cea218054ad277d6c126890213afde07b4eb1602
>  > >  Author: Jeff Layton <jlayton@xxxxxxxxxx>
>  > >  Date:   Tue Nov 20 23:19:03 2007 +0000
>  > >
>  > >     [CIFS] Fix potential data corruption when writing out cached dirty pages
>  > >
>  > >
>  > >  ...before that patch we flushed the cache on every setattr call. We now
>  > >  just do that on size changes, but it sounds like we need to reconsider
>  > >  this.
>  > >
>  > >  Trying to follow Lustre's model sounds tricky and I'm not sure it would
>  > >  work anyway since the server is going to update the mtime whenever the
>  > >  write comes in. Consider:
>  > >
>  > >  Write --> server sets mtime to current time
>  > >  Setattr --> server sets mtime for utimes() call
>  > >  Write --> server resets mtime to current time
>  > >
>  > >  ...I'm not sure that the client has much control over this. If we don't
>  > >  want to flush the cache for a setattr, there are two options:
>  > >
>  > >  1) delay the setattr until all of the pending writes have been flushed
>  > >  (tricky and not really what we want)
>  > >
>  > >  2) queue up another setattr call after the last write has completed
>  > >  (also tricky, and possibly problematic)
>  > >
>  > >  There could be other problems lurking here too. What happens to the
>  > >  pending writes when we change the ownership and mode to something where
>  > >  we no longer have permission to write to the file? I suppose they just
>  > >  error out and that's not good either.
>  > >
>  > >  For the record, NFS seems to flush the cache too on all setattr calls.
>  > >
>  > >  Jeff Layton <jlayton@xxxxxxxxxx>
>  >
>  > Here is a patch to fix the cp -p problem:
>  >
>  > Kukks/Jeff,
>  > coments?  Unless there are objections I want to add this to the tree
>  > quickly (I also need to add Q's memleak fix - but want to try it
>  > against more servers to see if there is a better way to do that)
>  >
>  > diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
>  > index af42262..e57e5c4 100644
>  > --- a/fs/cifs/inode.c
>  > +++ b/fs/cifs/inode.c
>  > @@ -1420,11 +1420,10 @@ int cifs_setattr(struct dentry *direntry,
>  > struct iattr *attrs)
>  >       }
>  >       cifsInode = CIFS_I(direntry->d_inode);
>  >
>  > -     /* BB check if we need to refresh inode from server now ? BB */
>  > -
>  > -     if (attrs->ia_valid & ATTR_SIZE) {
>  > +     if ((attrs->ia_valid & ATTR_MTIME) || (attrs->ia_valid & ATTR_SIZE)) {
>  >               /*
>  > -                Flush data before changing file size on server. If the
>  > +                Flush data before changing file size or changing the last
>  > +                write time of the file on the server. If the
>  >                  flush returns error, store it to report later and continue.
>  >                  BB: This should be smarter. Why bother flushing pages that
>  >                  will be truncated anyway? Also, should we error out here if
>  > @@ -1435,7 +1434,9 @@ int cifs_setattr(struct dentry *direntry, struct
>  > iattr *attrs)
>  >                       CIFS_I(direntry->d_inode)->write_behind_rc = rc;
>  >                       rc = 0;
>  >               }
>  > +     }
>  >
>  > +     if (attrs->ia_valid & ATTR_SIZE) {
>  >               /* To avoid spurious oplock breaks from server, in the case of
>  >                  inodes that we already have open, avoid doing path based
>  >                  setting of file size if we can do it by handle.
>  >
>  >
>  >
>
>  That should fix Kukks' problem, but still leaves us vulnerable to the
>  other problems. For instance:
>
>  Changing UID/GID or mode could cause later writes to fail.
>
>  What happens if we still have outstanding writes, but we've set
>  ATTR_READONLY on the file on the server?
>
>  What about the atime? Pending writes will clobber it.
>
>  I think the only thing we can really do is flush on every setattr call.
>  The only situation I can see where we don't need to flush is possibly
>  where we're *only* changing the ctime, and it's probably not worth it
>  to make that a special case.
>
>  How about this patch instead?
>
>  Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
>
>
>  diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
>  index af42262..8bcbfff 100644
>
> --- a/fs/cifs/inode.c
>  +++ b/fs/cifs/inode.c
>  @@ -1420,22 +1420,20 @@ int cifs_setattr(struct dentry *direntry, struct iattr *attrs)
>
>         }
>         cifsInode = CIFS_I(direntry->d_inode);
>
>  -       /* BB check if we need to refresh inode from server now ? BB */
>  +       /*
>  +          Flush data before changing attributes on server. If the
>
> +          flush returns error, store it to report later and continue.
>  +          BB: This should be smarter. Why bother flushing pages that
>  +          will be truncated anyway? Also, what should we do if the
>  +          the flush returns error?
>  +        */
>  +       rc = filemap_write_and_wait(direntry->d_inode->i_mapping);
>  +       if (rc != 0) {
>  +               CIFS_I(direntry->d_inode)->write_behind_rc = rc;
>
> +               rc = 0;
>  +       }
>
>         if (attrs->ia_valid & ATTR_SIZE) {
>  -               /*
>
> -                  Flush data before changing file size on server. If the
>  -                  flush returns error, store it to report later and continue.
>
> -                  BB: This should be smarter. Why bother flushing pages that
>  -                  will be truncated anyway? Also, should we error out here if
>  -                  the flush returns error?
>  -                */
>  -               rc = filemap_write_and_wait(direntry->d_inode->i_mapping);
>  -               if (rc != 0) {
>  -                       CIFS_I(direntry->d_inode)->write_behind_rc = rc;
>  -                       rc = 0;
>  -               }
>
>
> -
>                 /* To avoid spurious oplock breaks from server, in the case of
>                    inodes that we already have open, avoid doing path based
>                    setting of file size if we can do it by handle.
>

-- 
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html