Re: [PATCH] xfs: fix i_version handling in xfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2022-08-17 at 09:37 +1000, Dave Chinner wrote:
> On Tue, Aug 16, 2022 at 01:14:55PM -0400, David Wysochanski wrote:
> > On Tue, Aug 16, 2022 at 9:19 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > 
> > > The i_version in xfs_trans_log_inode is bumped for any inode update,
> > > including atime-only updates due to reads. We don't want to record those
> > > in the i_version, as they don't represent "real" changes. Remove that
> > > callsite.
> > > 
> > > In xfs_vn_update_time, if S_VERSION is flagged, then attempt to bump the
> > > i_version and turn on XFS_ILOG_CORE if it happens. In
> > > xfs_trans_ichgtime, update the i_version if the mtime or ctime are being
> > > updated.
> > > 
> > > Cc: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/libxfs/xfs_trans_inode.c | 17 +++--------------
> > >  fs/xfs/xfs_iops.c               |  4 ++++
> > >  2 files changed, 7 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> > > index 8b5547073379..78bf7f491462 100644
> > > --- a/fs/xfs/libxfs/xfs_trans_inode.c
> > > +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> > > @@ -71,6 +71,8 @@ xfs_trans_ichgtime(
> > >                 inode->i_ctime = tv;
> > >         if (flags & XFS_ICHGTIME_CREATE)
> > >                 ip->i_crtime = tv;
> > > +       if (flags & (XFS_ICHGTIME_MOD|XFS_ICHGTIME_CHG))
> > > +               inode_inc_iversion(inode);
> > >  }
> > > 
> > >  /*
> > > @@ -116,20 +118,7 @@ xfs_trans_log_inode(
> > >                 spin_unlock(&inode->i_lock);
> > >         }
> > > 
> > > -       /*
> > > -        * First time we log the inode in a transaction, bump the inode change
> > > -        * counter if it is configured for this to occur. While we have the
> > > -        * inode locked exclusively for metadata modification, we can usually
> > > -        * avoid setting XFS_ILOG_CORE if no one has queried the value since
> > > -        * the last time it was incremented. If we have XFS_ILOG_CORE already
> > > -        * set however, then go ahead and bump the i_version counter
> > > -        * unconditionally.
> > > -        */
> > > -       if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) {
> > > -               if (IS_I_VERSION(inode) &&
> > > -                   inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE))
> > > -                       iversion_flags = XFS_ILOG_CORE;
> > > -       }
> > > +       set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags);
> > > 
> > >         /*
> > >          * If we're updating the inode core or the timestamps and it's possible
> > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > index 45518b8c613c..162e044c7f56 100644
> > > --- a/fs/xfs/xfs_iops.c
> > > +++ b/fs/xfs/xfs_iops.c
> > > @@ -718,6 +718,7 @@ xfs_setattr_nonsize(
> > >         }
> > > 
> > >         setattr_copy(mnt_userns, inode, iattr);
> > > +       inode_inc_iversion(inode);
> > >         xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > > 
> > >         XFS_STATS_INC(mp, xs_ig_attrchg);
> > > @@ -943,6 +944,7 @@ xfs_setattr_size(
> > > 
> > >         ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID)));
> > >         setattr_copy(mnt_userns, inode, iattr);
> > > +       inode_inc_iversion(inode);
> > >         xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > > 
> > >         XFS_STATS_INC(mp, xs_ig_attrchg);
> > > @@ -1047,6 +1049,8 @@ xfs_vn_update_time(
> > >                 inode->i_mtime = *now;
> > >         if (flags & S_ATIME)
> > >                 inode->i_atime = *now;
> > > +       if ((flags & S_VERSION) && inode_maybe_inc_iversion(inode, false))
> > > +               log_flags |= XFS_ILOG_CORE;
> > > 
> > >         xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > >         xfs_trans_log_inode(tp, ip, log_flags);
> > > --
> > > 2.37.2
> > > 
> > 
> > I have a test (details below) that shows an open issue with NFSv4.x +
> > fscache where an xfs exported filesystem would trigger unnecessary
> > over the wire READs after a umount/mount cycle of the NFS mount.  I
> > previously tracked this down to atime updates, but never followed
> > through on any patch.  Now that Jeff worked it out and this patch is
> > under review, I built 5.19 vanilla, retested, then built 5.19 + this
> > patch and verified the problem is fixed.
> 
> And so the question that needs to be answered is "why isn't relatime
> working for this workload to avoid unnecessary atime updates"?
> 
> Which then makes me ask "what's changing atime on the server between
> client side reads"?
> 
> Which then makes me wonder "what's actually changing iversion on the
> server?" because I don't think atime is the issue here.
> 
> I suspect that Jeff's patch is affecting this test case by removing
> iversion updates when the data is written back on the server. i.e.
> delayed allocation and unwritten extent conversion will no longer
> bump iversion when they log the inode metadata changes associated
> with extent allocation to store the data being written.  There may
> be other places where Jeff's patch removes implicit iversion
> updates, too, so it may not be writeback that is the issue here.
> 
> How that impacts on the observed behaviour is dependent on things I
> don't know, like what cachefiles is doing in the background,
> especially across NFS client unmount/mount cycles. However, this all
> makes me think the "atime is updated" behaviour is an observed
> symptom of something else changing iversion and/or cmtime between
> reads from the server...
> 

You may be right here.

What I see with both noatime and relatime is that the first read after a
write to a file results in the i_version being incremented, but then it
doesn't change on subsequent reads. Write to the file again, and the
i_version will get incremented again on the next read and then not again
until there is a write.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux