On Thu, 2023-01-26 at 12:36 +0100, Jan Kara wrote: > On Thu 26-01-23 05:54:16, Jeff Layton wrote: > > On Wed, 2023-01-25 at 17:06 +0100, Jan Kara wrote: > > > On Tue 24-01-23 14:30:19, Jeff Layton wrote: > > > > The i_version field in the kernel has had different semantics over > > > > the decades, but NFSv4 has certain expectations. Update the comments > > > > in iversion.h to describe when the i_version must change. > > > > > > > > Cc: Colin Walters <walters@xxxxxxxxxx> > > > > Cc: NeilBrown <neilb@xxxxxxx> > > > > Cc: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> > > > > Cc: Dave Chinner <david@xxxxxxxxxxxxx> > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > Looks good to me. But one note below: > > > > > > > diff --git a/include/linux/iversion.h b/include/linux/iversion.h > > > > index 6755d8b4f20b..fced8115a5f4 100644 > > > > --- a/include/linux/iversion.h > > > > +++ b/include/linux/iversion.h > > > > @@ -9,8 +9,25 @@ > > > > * --------------------------- > > > > * The change attribute (i_version) is mandated by NFSv4 and is mostly for > > > > * knfsd, but is also used for other purposes (e.g. IMA). The i_version must > > > > - * appear different to observers if there was a change to the inode's data or > > > > - * metadata since it was last queried. > > > > + * appear larger to observers if there was an explicit change to the inode's > > > > + * data or metadata since it was last queried. > > > > + * > > > > + * An explicit change is one that would ordinarily result in a change to the > > > > + * inode status change time (aka ctime). i_version must appear to change, even > > > > + * if the ctime does not (since the whole point is to avoid missing updates due > > > > + * to timestamp granularity). If POSIX or other relevant spec mandates that the > > > > + * ctime must change due to an operation, then the i_version counter must be > > > > + * incremented as well. > > > > + * > > > > + * Making the i_version update completely atomic with the operation itself would > > > > + * be prohibitively expensive. Traditionally the kernel has updated the times on > > > > + * directories after an operation that changes its contents. For regular files, > > > > + * the ctime is usually updated before the data is copied into the cache for a > > > > + * write. This means that there is a window of time when an observer can > > > > + * associate a new timestamp with old file contents. Since the purpose of the > > > > + * i_version is to allow for better cache coherency, the i_version must always > > > > + * be updated after the results of the operation are visible. Updating it before > > > > + * and after a change is also permitted. > > > > > > This sounds good but it is not the case for any of the current filesystems, is > > > it? Perhaps the documentation should mention this so that people are not > > > confused? > > > > Correct. Currently, all filesystems change the times and version before > > a write instead of after. I'm hoping that situation will change soon > > though, as I've been working on a patchset to fix this for tmpfs, ext4 > > and btrfs. > > That is good but we'll see how long it takes to get merged. AFAIR it is not > a complete nobrainer ;) > > > If you still want to see something for this though, what would you > > suggest for verbiage? > > Sure: > > ... the i_version must a be updated after the results of the operation are > visible (note that none of the filesystems currently do this, it is a work > in progress to fix this). > > And once your patches are merged, you can also delete this note :). > > Honza Sounds good, I folded something similar to that into the patch and pushed it into the branch I'm feeding into linux-next. I won't bother re-posting for just that though: diff --git a/include/linux/iversion.h b/include/linux/iversion.h index fced8115a5f4..f174ff1b59ee 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -27,7 +27,8 @@ * associate a new timestamp with old file contents. Since the purpose of the * i_version is to allow for better cache coherency, the i_version must always * be updated after the results of the operation are visible. Updating it before - * and after a change is also permitted. + * and after a change is also permitted. (Note that no filesystems currently do + * this. Fixing that is a work-in-progress). * * Observers see the i_version as a 64-bit number that never decreases. If it * remains the same since it was last checked, then nothing has changed in the Thanks! -- Jeff Layton <jlayton@xxxxxxxxxx>