The purpose of this patchset is two-fold: 1/ correct performance issues in the existing i_version counter implementations and (hopefully) to bring them into behavioral alignment. 2/ expose the i_version field to userland via statx. This is useful for testing the various i_version implementations, but may also be useful for userland applications that want a way to tell whether a file might have changed. The i_version field in Linux has been around since 1994, but its meaning and behavior has subtly changed over time [1]. The first patch in this series fleshes out the comments in iversion.h to try and give a clear explanation of what's expected from the filesystem. My first ask is for feedback on this -- does the proposed definition seem reasonable for presenting to userland? There are two main consumers of i_version in the kernel: nfsd and IMA. They both only want to see a change to the i_version iff there was an explicit change to the inode. They can cope with an implementation that does more increments than that, but that measurably harms performance. I'll argue that atime-only updates SHOULD be excluded from i_version bumps since they don't represent a "real" change to the inode. Spurious updates to the i_version have real, measurable performance impacts with NFSv4, and possibly with IMA. There are 3 kernel-managed i_version implementations in the kernel: btrfs, ext4 and xfs. btrfs seems to work as we'd expect. It doesn't bump the i_version on atime-only updates and seems to bump it appropriately for other activity. ext4 currently bumps the i_version even when only the atime is being updated. I have a patch to fix this that Jan and Christian have Reviewed, but I haven't yet heard from Ted or Andreas. xfs has the same issue as ext4 bumping i_version on atime updates. He has NACK'ed the patch I proposed since there are evidently tools that depend on every log transaction being represented in i_version. I've included the xfs patch in this series, but if it's not suitable I'm open to fixing this other ways, but I'll need some feedback as to what the xfs developers would like to do here. Should we add a new on-disk field to the inode? Try to do something clever with "unused" parts of the ctime? What would be best? Lastly, there are patches to allow NFS and Ceph to report this value as well. They should be fairly straightforward once the earlier pile is resolved. Note that I dropped the patch to make AFS report STATX_INO_VERSION since its semantics don't match the proposed definition. [1]: Now, for your entertainment... A BRIEF HISTORY OF THE I_VERSION FIELD ====================================== PRE-GIT-HISTORY ERA: -------------------- The i_version field first appears in v1.1.30 (summer 1994) with more increments added to ext2 over next few v1.1.x versions. There were ioctls to set and fetch the value in ext2. They're still there but they access the i_generation counter now. It was mostly used to catch races in lookup and readdir due to directory changes, and a lot of filesystems still use it that way today. Non-directory inodes would have this value set, but the kernel didn't do much with it. GIT HISTORY ERA: ---------------- Then in 2.6.24, Jean Noel Cordenner from Bull increased the size to 64 bits, added the MS_I_VERSION flag, and started incrementing it in file_update_time: commit 7a224228ed79d587ece2304869000aad1b8e97dd Author: Jean Noel Cordenner <jean-noel.cordenner@xxxxxxxx> Date: Mon Jan 28 23:58:27 2008 -0500 vfs: Add 64 bit i_version support The i_version field of the inode is changed to be a 64-bit counter that is set on every inode creation and that is incremented every time the inode data is modified (similarly to the "ctime" time-stamp). The aim is to fulfill a NFSv4 requirement for rfc3530. This first part concerns the vfs, it converts the 32-bit i_version in the generic inode to a 64-bit, a flag is added in the super block in order to check if the feature is enabled and the i_version is incremented in the vfs. Signed-off-by: Mingming Cao <cmm@xxxxxxxxxx> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@xxxxxxxx> Signed-off-by: Kalpak Shah <kalpak@xxxxxxxxxxxxx> Then he added support to ext4, plus the mount option to enable it. The problem with the i_version being incremented during atime updates probably dates back to this patch. I imagine it was probably just an oversight, though it could just have been due to unclear definition for the change attr in the NFSv4.0 spec: commit 25ec56b518257a56d2ff41a941d288e4b5ff9488 Author: Jean Noel Cordenner <jean-noel.cordenner@xxxxxxxx> Date: Mon Jan 28 23:58:27 2008 -0500 ext4: Add inode version support in ext4 This patch adds 64-bit inode version support to ext4. The lower 32 bits are stored in the osd1.linux1.l_i_version field while the high 32 bits are stored in the i_version_hi field newly created in the ext4_inode. This field is incremented in case the ext4_inode is large enough. A i_version mount option has been added to enable the feature. Signed-off-by: Mingming Cao <cmm@xxxxxxxxxx> Signed-off-by: Andreas Dilger <adilger@xxxxxxxxxxxxx> Signed-off-by: Kalpak Shah <kalpak@xxxxxxxxxxxxx> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@xxxxxxxx> Bruce then added support for it to nfsd: commit c654b8a9cba6002aad1c01919e4928a79a4a6dcf Author: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> Date: Thu Apr 16 17:33:25 2009 -0400 nfsd: support ext4 i_version ext4 supports a real NFSv4 change attribute, which is bumped whenever the ctime would be updated, including times when two updates arrive within a jiffy of each other. (Note that although ext4 has space for nanosecond-precision ctime, the real resolution is lower: it actually uses jiffies as the time-source.) This ensures clients will invalidate their caches when they need to. There is some fear that keeping the i_version up-to-date could have performance drawbacks, so for now it's turned on only by a mount option. We hope to do something better eventually. Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> Cc: Theodore Tso <tytso@xxxxxxx> Josef converted btrfs to use it instead of their own internal counter. It looks like the btrfs implementation has probably avoided the issue with atime updates causing i_version bumps. commit 0c4d2d95d06e920e0c61707e62c7fffc9c57f63a Author: Josef Bacik <josef@xxxxxxxxxx> Date: Thu Apr 5 15:03:02 2012 -0400 Btrfs: use i_version instead of our own sequence We've been keeping around the inode sequence number in hopes that somebody would use it, but nobody uses it and people actually use i_version which serves the same purpose, so use i_version where we used the incore inode's sequence number and that way the sequence is updated properly across the board, and not just in file write. Thanks, Signed-off-by: Josef Bacik <josef@xxxxxxxxxx> Then, in 2013 Dave added support for xfs with v3 superblocks. There were some later changes of how it was stored, but its behavior has largely been the same on xfs since then. Note that at the time, the stated reason for adding this was to provide NFSv4 semantics: commit dc037ad7d24f3711e431a45c053b5d425995e9e4 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Thu Jun 27 16:04:59 2013 +1000 xfs: implement inode change count For CRC enabled filesystems, add support for the monotonic inode version change counter that is needed by protocols like NFSv4 for determining if the inode has changed in any way at all between two unrelated operations on the inode. This bumps the change count the first time an inode is dirtied in a transaction. Since all modifications to the inode are logged, this will catch all changes that are made to the inode, including timestamp updates that occur during data writes. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Mark Tinguely <tinguely@xxxxxxx> Reviewed-by: Chandra Seetharaman <sekharan@xxxxxxxxxx> Signed-off-by: Ben Myers <bpm@xxxxxxx> Jeff Layton (7): iversion: update comments with info about atime updates ext4: fix i_version handling in ext4 ext4: unconditionally enable the i_version counter xfs: don't bump the i_version on an atime update in xfs_vn_update_time vfs: report an inode version in statx for IS_I_VERSION inodes nfs: report the inode version in statx if requested ceph: fill in the change attribute in statx requests fs/ceph/inode.c | 14 +++++++++----- fs/ext4/inode.c | 10 +++++----- fs/ext4/ioctl.c | 4 ++++ fs/ext4/move_extent.c | 6 ++++++ fs/ext4/super.c | 13 ++++--------- fs/ext4/xattr.c | 1 + fs/nfs/inode.c | 7 +++++-- fs/stat.c | 7 +++++++ fs/xfs/libxfs/xfs_log_format.h | 2 +- fs/xfs/libxfs/xfs_trans_inode.c | 2 +- fs/xfs/xfs_iops.c | 11 +++++++++-- include/linux/iversion.h | 23 +++++++++++++++++++++-- include/linux/stat.h | 1 + include/uapi/linux/stat.h | 3 ++- samples/vfs/test-statx.c | 8 ++++++-- 15 files changed, 82 insertions(+), 30 deletions(-) -- 2.37.2