The i_version field in the kernel has had different semantics over the decades, but we're now proposing to expose it to userland via statx. This means that we need a clear, consistent definition of what it means and when it should change. Update the comments in iversion.h to describe how a conformant i_version implementation is expected to behave. This definition suits the current users of i_version (NFSv4 and IMA), but is loose enough to allow for a wide range of possible implementations. Cc: Colin Walters <walters@xxxxxxxxxx> Cc: NeilBrown <neilb@xxxxxxx> Cc: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> Cc: Dave Chinner <david@xxxxxxxxxxxxx> Link: https://lore.kernel.org/linux-xfs/166086932784.5425.17134712694961326033@xxxxxxxxxxxxxxxxxxxxx/#t Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> --- include/linux/iversion.h | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/include/linux/iversion.h b/include/linux/iversion.h index 3bfebde5a1a6..45e93e1b4edc 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -9,8 +9,19 @@ * --------------------------- * The change attribute (i_version) is mandated by NFSv4 and is mostly for * knfsd, but is also used for other purposes (e.g. IMA). The i_version must - * appear different to observers if there was a change to the inode's data or - * metadata since it was last queried. + * appear different to observers if there was an explicit change to the inode's + * data or metadata since it was last queried. + * + * An explicit change is one that would ordinarily result in a change to the + * inode status change time (aka ctime). The version must appear to change, even + * if the ctime does not (since the whole point is to avoid missing updates due + * to timestamp granularity). If POSIX mandates that the ctime must change due + * to an operation, then the i_version counter must be incremented as well. + * + * A conformant implementation is allowed to increment the counter in other + * cases, but this is not optimal. NFSv4 and IMA both use this value to determine + * whether caches are up to date. Spurious increments can cause false cache + * invalidations. * * Observers see the i_version as a 64-bit number that never decreases. If it * remains the same since it was last checked, then nothing has changed in the @@ -66,6 +77,14 @@ * Storing the value to disk therefore does not count as a query, so those * filesystems should use inode_peek_iversion to grab the value to be stored. * There is no need to flag the value as having been queried in that case. + * + * Notes on atime updates + * ---------------------- + * Access time (atime) updates due to reads or similar activity do not represent + * an explicit change to the inode data or metadata. If the only change to the + * inode is the atime, then i_version should not be incremented. If an observer + * cares about atime updates, it should plan to fetch and store the atime in + * conjunction with the i_version. */ /* -- 2.37.2