Re: [PATCH v8 RESEND 2/8] fs: clarify when the i_version counter must be updated

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2023-01-25 at 17:06 +0100, Jan Kara wrote:
> On Tue 24-01-23 14:30:19, Jeff Layton wrote:
> > The i_version field in the kernel has had different semantics over
> > the decades, but NFSv4 has certain expectations. Update the comments
> > in iversion.h to describe when the i_version must change.
> > 
> > Cc: Colin Walters <walters@xxxxxxxxxx>
> > Cc: NeilBrown <neilb@xxxxxxx>
> > Cc: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx>
> > Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> 
> Looks good to me. But one note below:
> 
> > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > index 6755d8b4f20b..fced8115a5f4 100644
> > --- a/include/linux/iversion.h
> > +++ b/include/linux/iversion.h
> > @@ -9,8 +9,25 @@
> >   * ---------------------------
> >   * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> >   * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> > - * appear different to observers if there was a change to the inode's data or
> > - * metadata since it was last queried.
> > + * appear larger to observers if there was an explicit change to the inode's
> > + * data or metadata since it was last queried.
> > + *
> > + * An explicit change is one that would ordinarily result in a change to the
> > + * inode status change time (aka ctime). i_version must appear to change, even
> > + * if the ctime does not (since the whole point is to avoid missing updates due
> > + * to timestamp granularity). If POSIX or other relevant spec mandates that the
> > + * ctime must change due to an operation, then the i_version counter must be
> > + * incremented as well.
> > + *
> > + * Making the i_version update completely atomic with the operation itself would
> > + * be prohibitively expensive. Traditionally the kernel has updated the times on
> > + * directories after an operation that changes its contents. For regular files,
> > + * the ctime is usually updated before the data is copied into the cache for a
> > + * write. This means that there is a window of time when an observer can
> > + * associate a new timestamp with old file contents. Since the purpose of the
> > + * i_version is to allow for better cache coherency, the i_version must always
> > + * be updated after the results of the operation are visible. Updating it before
> > + * and after a change is also permitted.
> 
> This sounds good but it is not the case for any of the current filesystems, is
> it? Perhaps the documentation should mention this so that people are not
> confused?
> 
> 								Honza

Correct. Currently, all filesystems change the times and version before
a write instead of after. I'm hoping that situation will change soon
though, as I've been working on a patchset to fix this for tmpfs, ext4
and btrfs.

If you still want to see something for this though, what would you
suggest for verbiage?

Thanks,
-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux