Re: [PATCH] xfs: fix i_version handling in xfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 16, 2022 at 01:14:55PM -0400, David Wysochanski wrote:
> On Tue, Aug 16, 2022 at 9:19 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> >
> > The i_version in xfs_trans_log_inode is bumped for any inode update,
> > including atime-only updates due to reads. We don't want to record those
> > in the i_version, as they don't represent "real" changes. Remove that
> > callsite.
> >
> > In xfs_vn_update_time, if S_VERSION is flagged, then attempt to bump the
> > i_version and turn on XFS_ILOG_CORE if it happens. In
> > xfs_trans_ichgtime, update the i_version if the mtime or ctime are being
> > updated.
> >
> > Cc: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> >  fs/xfs/libxfs/xfs_trans_inode.c | 17 +++--------------
> >  fs/xfs/xfs_iops.c               |  4 ++++
> >  2 files changed, 7 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> > index 8b5547073379..78bf7f491462 100644
> > --- a/fs/xfs/libxfs/xfs_trans_inode.c
> > +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> > @@ -71,6 +71,8 @@ xfs_trans_ichgtime(
> >                 inode->i_ctime = tv;
> >         if (flags & XFS_ICHGTIME_CREATE)
> >                 ip->i_crtime = tv;
> > +       if (flags & (XFS_ICHGTIME_MOD|XFS_ICHGTIME_CHG))
> > +               inode_inc_iversion(inode);
> >  }
> >
> >  /*
> > @@ -116,20 +118,7 @@ xfs_trans_log_inode(
> >                 spin_unlock(&inode->i_lock);
> >         }
> >
> > -       /*
> > -        * First time we log the inode in a transaction, bump the inode change
> > -        * counter if it is configured for this to occur. While we have the
> > -        * inode locked exclusively for metadata modification, we can usually
> > -        * avoid setting XFS_ILOG_CORE if no one has queried the value since
> > -        * the last time it was incremented. If we have XFS_ILOG_CORE already
> > -        * set however, then go ahead and bump the i_version counter
> > -        * unconditionally.
> > -        */
> > -       if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) {
> > -               if (IS_I_VERSION(inode) &&
> > -                   inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE))
> > -                       iversion_flags = XFS_ILOG_CORE;
> > -       }
> > +       set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags);
> >
> >         /*
> >          * If we're updating the inode core or the timestamps and it's possible
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index 45518b8c613c..162e044c7f56 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -718,6 +718,7 @@ xfs_setattr_nonsize(
> >         }
> >
> >         setattr_copy(mnt_userns, inode, iattr);
> > +       inode_inc_iversion(inode);
> >         xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> >
> >         XFS_STATS_INC(mp, xs_ig_attrchg);
> > @@ -943,6 +944,7 @@ xfs_setattr_size(
> >
> >         ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID)));
> >         setattr_copy(mnt_userns, inode, iattr);
> > +       inode_inc_iversion(inode);
> >         xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> >
> >         XFS_STATS_INC(mp, xs_ig_attrchg);
> > @@ -1047,6 +1049,8 @@ xfs_vn_update_time(
> >                 inode->i_mtime = *now;
> >         if (flags & S_ATIME)
> >                 inode->i_atime = *now;
> > +       if ((flags & S_VERSION) && inode_maybe_inc_iversion(inode, false))
> > +               log_flags |= XFS_ILOG_CORE;
> >
> >         xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> >         xfs_trans_log_inode(tp, ip, log_flags);
> > --
> > 2.37.2
> >
> 
> I have a test (details below) that shows an open issue with NFSv4.x +
> fscache where an xfs exported filesystem would trigger unnecessary
> over the wire READs after a umount/mount cycle of the NFS mount.  I
> previously tracked this down to atime updates, but never followed
> through on any patch.  Now that Jeff worked it out and this patch is
> under review, I built 5.19 vanilla, retested, then built 5.19 + this
> patch and verified the problem is fixed.

And so the question that needs to be answered is "why isn't relatime
working for this workload to avoid unnecessary atime updates"?

Which then makes me ask "what's changing atime on the server between
client side reads"?

Which then makes me wonder "what's actually changing iversion on the
server?" because I don't think atime is the issue here.

I suspect that Jeff's patch is affecting this test case by removing
iversion updates when the data is written back on the server. i.e.
delayed allocation and unwritten extent conversion will no longer
bump iversion when they log the inode metadata changes associated
with extent allocation to store the data being written.  There may
be other places where Jeff's patch removes implicit iversion
updates, too, so it may not be writeback that is the issue here.

How that impacts on the observed behaviour is dependent on things I
don't know, like what cachefiles is doing in the background,
especially across NFS client unmount/mount cycles. However, this all
makes me think the "atime is updated" behaviour is an observed
symptom of something else changing iversion and/or cmtime between
reads from the server...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux