On Fri, Aug 26, 2022 at 05:46:57PM -0400, Jeff Layton wrote: > The i_version field in the kernel has had different semantics over > the decades, but we're now proposing to expose it to userland via > statx. This means that we need a clear, consistent definition of > what it means and when it should change. > > Update the comments in iversion.h to describe how a conformant > i_version implementation is expected to behave. This definition > suits the current users of i_version (NFSv4 and IMA), but is > loose enough to allow for a wide range of possible implementations. > > Cc: Colin Walters <walters@xxxxxxxxxx> > Cc: NeilBrown <neilb@xxxxxxx> > Cc: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> > Cc: Dave Chinner <david@xxxxxxxxxxxxx> > Link: https://lore.kernel.org/linux-xfs/166086932784.5425.17134712694961326033@xxxxxxxxxxxxxxxxxxxxx/#t > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > --- > include/linux/iversion.h | 23 +++++++++++++++++++++-- > 1 file changed, 21 insertions(+), 2 deletions(-) > > diff --git a/include/linux/iversion.h b/include/linux/iversion.h > index 3bfebde5a1a6..45e93e1b4edc 100644 > --- a/include/linux/iversion.h > +++ b/include/linux/iversion.h > @@ -9,8 +9,19 @@ > * --------------------------- > * The change attribute (i_version) is mandated by NFSv4 and is mostly for > * knfsd, but is also used for other purposes (e.g. IMA). The i_version must > - * appear different to observers if there was a change to the inode's data or > - * metadata since it was last queried. > + * appear different to observers if there was an explicit change to the inode's > + * data or metadata since it was last queried. > + * > + * An explicit change is one that would ordinarily result in a change to the > + * inode status change time (aka ctime). The version must appear to change, even > + * if the ctime does not (since the whole point is to avoid missing updates due > + * to timestamp granularity). If POSIX mandates that the ctime must change due > + * to an operation, then the i_version counter must be incremented as well. > + * > + * A conformant implementation is allowed to increment the counter in other > + * cases, but this is not optimal. NFSv4 and IMA both use this value to determine > + * whether caches are up to date. Spurious increments can cause false cache > + * invalidations. "not optimal", but never-the-less allowed - that's "unspecified behaviour" if I've ever seen it. How is userspace supposed to know/deal with this? Indeed, this loophole clause doesn't exist in the man pages that define what statx.stx_ino_version means. The man pages explicitly define that stx_ino_version only ever changes when stx_ctime changes. IOWs, the behaviour userspace developers are going to expect *does not include* stx_ino_version changing it more often than ctime is changed. Hence a kernel iversion implementation that bumps the counter more often than ctime changes *is not conformant with the statx version counter specification*. IOWs, we can't export such behaviour to userspace *ever* - it is a non-conformant implementation. Hence I think anything that bumps iversion outside the bounds of the statx definition should be declared as such: "Non-conformant iversion implementations: - MUST NOT be exported by statx() to userspace - MUST be -tolerated- by kernel internal applications that use iversion for their own purposes." Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx