On Fri, Sep 30, 2022 at 2:30 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > The c/mtime and i_version currently get updated before the data is > copied (or a DIO write is issued), which is problematic for NFS. > > READ+GETATTR can race with a write (even a local one) in such a way as > to make the client associate the state of the file with the wrong change > attribute. That association can persist indefinitely if the file sees no > further changes. > > Move the setting of times to the bottom of the function in > __generic_file_write_iter and only update it if something was > successfully written. > This solution is wrong for several reasons: 1. There is still file_update_time() in ->page_mkwrite() so you haven't solved the problem completely 2. The other side of the coin is that post crash state is more likely to end up data changes without mtime/ctime change If I read the problem description correctly, then a solution that invalidates the NFS cache before AND after the write would be acceptable. Right? Would an extra i_version bump after the write solve the race? > If the time update fails, log a warning once, but don't fail the write. > All of the existing callers use update_time functions that don't fail, > so we should never trip this. > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > --- > mm/filemap.c | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 15800334147b..72c0ceb75176 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3812,10 +3812,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > if (err) > goto out; > > - err = file_update_time(file); > - if (err) > - goto out; > - > if (iocb->ki_flags & IOCB_DIRECT) { > loff_t pos, endbyte; > > @@ -3868,6 +3864,19 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > iocb->ki_pos += written; > } > out: > + if (written > 0) { > + err = file_update_time(file); > + /* > + * There isn't much we can do at this point if updating the > + * times fails after a successful write. The times and i_version > + * should still be updated in the inode, and it should still be > + * marked dirty, so hopefully the next inode update will catch it. > + * Log a warning once so we have a record that something untoward > + * has occurred. > + */ > + WARN_ONCE(err, "Failed to update m/ctime after write: %ld\n", err); pr_warn_once() please - this is not a programming assertion. Thanks, Amir.