On Wed, 2022-08-17 at 09:37 +1000, Dave Chinner wrote: > On Tue, Aug 16, 2022 at 01:14:55PM -0400, David Wysochanski wrote: > > On Tue, Aug 16, 2022 at 9:19 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > The i_version in xfs_trans_log_inode is bumped for any inode update, > > > including atime-only updates due to reads. We don't want to record those > > > in the i_version, as they don't represent "real" changes. Remove that > > > callsite. > > > > > > In xfs_vn_update_time, if S_VERSION is flagged, then attempt to bump the > > > i_version and turn on XFS_ILOG_CORE if it happens. In > > > xfs_trans_ichgtime, update the i_version if the mtime or ctime are being > > > updated. > > > > > > Cc: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > Cc: Dave Chinner <david@xxxxxxxxxxxxx> > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > --- > > > fs/xfs/libxfs/xfs_trans_inode.c | 17 +++-------------- > > > fs/xfs/xfs_iops.c | 4 ++++ > > > 2 files changed, 7 insertions(+), 14 deletions(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c > > > index 8b5547073379..78bf7f491462 100644 > > > --- a/fs/xfs/libxfs/xfs_trans_inode.c > > > +++ b/fs/xfs/libxfs/xfs_trans_inode.c > > > @@ -71,6 +71,8 @@ xfs_trans_ichgtime( > > > inode->i_ctime = tv; > > > if (flags & XFS_ICHGTIME_CREATE) > > > ip->i_crtime = tv; > > > + if (flags & (XFS_ICHGTIME_MOD|XFS_ICHGTIME_CHG)) > > > + inode_inc_iversion(inode); > > > } > > > > > > /* > > > @@ -116,20 +118,7 @@ xfs_trans_log_inode( > > > spin_unlock(&inode->i_lock); > > > } > > > > > > - /* > > > - * First time we log the inode in a transaction, bump the inode change > > > - * counter if it is configured for this to occur. While we have the > > > - * inode locked exclusively for metadata modification, we can usually > > > - * avoid setting XFS_ILOG_CORE if no one has queried the value since > > > - * the last time it was incremented. If we have XFS_ILOG_CORE already > > > - * set however, then go ahead and bump the i_version counter > > > - * unconditionally. > > > - */ > > > - if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) { > > > - if (IS_I_VERSION(inode) && > > > - inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE)) > > > - iversion_flags = XFS_ILOG_CORE; > > > - } > > > + set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags); > > > > > > /* > > > * If we're updating the inode core or the timestamps and it's possible > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > > > index 45518b8c613c..162e044c7f56 100644 > > > --- a/fs/xfs/xfs_iops.c > > > +++ b/fs/xfs/xfs_iops.c > > > @@ -718,6 +718,7 @@ xfs_setattr_nonsize( > > > } > > > > > > setattr_copy(mnt_userns, inode, iattr); > > > + inode_inc_iversion(inode); > > > xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); > > > > > > XFS_STATS_INC(mp, xs_ig_attrchg); > > > @@ -943,6 +944,7 @@ xfs_setattr_size( > > > > > > ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID))); > > > setattr_copy(mnt_userns, inode, iattr); > > > + inode_inc_iversion(inode); > > > xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); > > > > > > XFS_STATS_INC(mp, xs_ig_attrchg); > > > @@ -1047,6 +1049,8 @@ xfs_vn_update_time( > > > inode->i_mtime = *now; > > > if (flags & S_ATIME) > > > inode->i_atime = *now; > > > + if ((flags & S_VERSION) && inode_maybe_inc_iversion(inode, false)) > > > + log_flags |= XFS_ILOG_CORE; > > > > > > xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); > > > xfs_trans_log_inode(tp, ip, log_flags); > > > -- > > > 2.37.2 > > > > > > > I have a test (details below) that shows an open issue with NFSv4.x + > > fscache where an xfs exported filesystem would trigger unnecessary > > over the wire READs after a umount/mount cycle of the NFS mount. I > > previously tracked this down to atime updates, but never followed > > through on any patch. Now that Jeff worked it out and this patch is > > under review, I built 5.19 vanilla, retested, then built 5.19 + this > > patch and verified the problem is fixed. > > And so the question that needs to be answered is "why isn't relatime > working for this workload to avoid unnecessary atime updates"? > > Which then makes me ask "what's changing atime on the server between > client side reads"? > > Which then makes me wonder "what's actually changing iversion on the > server?" because I don't think atime is the issue here. > > I suspect that Jeff's patch is affecting this test case by removing > iversion updates when the data is written back on the server. i.e. > delayed allocation and unwritten extent conversion will no longer > bump iversion when they log the inode metadata changes associated > with extent allocation to store the data being written. There may > be other places where Jeff's patch removes implicit iversion > updates, too, so it may not be writeback that is the issue here. > > How that impacts on the observed behaviour is dependent on things I > don't know, like what cachefiles is doing in the background, > especially across NFS client unmount/mount cycles. However, this all > makes me think the "atime is updated" behaviour is an observed > symptom of something else changing iversion and/or cmtime between > reads from the server... > You may be right here. What I see with both noatime and relatime is that the first read after a write to a file results in the i_version being incremented, but then it doesn't change on subsequent reads. Write to the file again, and the i_version will get incremented again on the next read and then not again until there is a write. -- Jeff Layton <jlayton@xxxxxxxxxx>