Re: [PATCH] xfs: fix i_version handling in xfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2022-08-16 at 08:43 -0700, Darrick J. Wong wrote:
> On Tue, Aug 16, 2022 at 09:17:36AM -0400, Jeff Layton wrote:
> > The i_version in xfs_trans_log_inode is bumped for any inode update,
> > including atime-only updates due to reads. We don't want to record those
> > in the i_version, as they don't represent "real" changes. Remove that
> > callsite.
> > 
> > In xfs_vn_update_time, if S_VERSION is flagged, then attempt to bump the
> > i_version and turn on XFS_ILOG_CORE if it happens. In
> > xfs_trans_ichgtime, update the i_version if the mtime or ctime are being
> > updated.
> 
> What about operations that don't touch the mtime but change the file
> metadata anyway?  There are a few of those, like the blockgc garbage
> collector, deduperange, and the defrag tool.
> 

Do those change the c/mtime at all?

It's possible we're missing some places that should change the i_version
as well. We may need some more call sites.

> Zooming out a bit -- what does i_version signal, concretely?  I thought
> it was used by nfs (and maybe ceph?) to signal to clients that the file
> on the server has moved on, and the client needs to invalidate its
> caches.  I thought afs had a similar generation counter, though it's
> only used to cache file data, not metadata?  Does an i_version change
> cause all of them to invalidate caches, or is there more behavior I
> don't know about?
> 

For NFS, it indicates a change to the change attribute indicates that
there has been a change to the data or metadata for the file. atime
changes due to reads are specifically exempted from this, but we do bump
the i_version if someone (e.g.) changes the atime via utimes(). 

The NFS client will generally invalidate its caches for the inode when
it notices a change attribute change.

FWIW, AFS may not meet this standard since it doesn't generally
increment the counter on metadata changes. It may turn out that we don't
want to expose this to the AFS client due to that (or maybe come up with
some way to indicate this difference).

> Does that mean that we should bump i_version for any file data or
> attribute that could be queried or observed by userspace?  In which case
> I suppose this change is still correct, even if it relaxes i_version
> updates from "any change to the inode whatsoever" to "any change that
> would bump mtime".  Unless FIEMAP is part of "attributes observed by
> userspace".
> 
> (The other downside I can see is that now we have to remember to bump
> timestamps for every new file operation we add, unlike the current code
> which is centrally located in xfs_trans_log_inode.)
> 

The main reason for the change attribute in NFS was that NFSv3 is
plagued with cache-coherency problems due to coarse-grained timestamp
granularity. It was conceived as a way to indicate that the inode had
changed without relying on timestamps.

In practice, we want to bump the i_version counter whenever the ctime or
mtime would be changed.

> --D
> 
> > Cc: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> >  fs/xfs/libxfs/xfs_trans_inode.c | 17 +++--------------
> >  fs/xfs/xfs_iops.c               |  4 ++++
> >  2 files changed, 7 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> > index 8b5547073379..78bf7f491462 100644
> > --- a/fs/xfs/libxfs/xfs_trans_inode.c
> > +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> > @@ -71,6 +71,8 @@ xfs_trans_ichgtime(
> >  		inode->i_ctime = tv;
> >  	if (flags & XFS_ICHGTIME_CREATE)
> >  		ip->i_crtime = tv;
> > +	if (flags & (XFS_ICHGTIME_MOD|XFS_ICHGTIME_CHG))
> > +		inode_inc_iversion(inode);
> >  }
> >  
> >  /*
> > @@ -116,20 +118,7 @@ xfs_trans_log_inode(
> >  		spin_unlock(&inode->i_lock);
> >  	}
> >  
> > -	/*
> > -	 * First time we log the inode in a transaction, bump the inode change
> > -	 * counter if it is configured for this to occur. While we have the
> > -	 * inode locked exclusively for metadata modification, we can usually
> > -	 * avoid setting XFS_ILOG_CORE if no one has queried the value since
> > -	 * the last time it was incremented. If we have XFS_ILOG_CORE already
> > -	 * set however, then go ahead and bump the i_version counter
> > -	 * unconditionally.
> > -	 */
> > -	if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) {
> > -		if (IS_I_VERSION(inode) &&
> > -		    inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE))
> > -			iversion_flags = XFS_ILOG_CORE;
> > -	}
> > +	set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags);
> >  
> >  	/*
> >  	 * If we're updating the inode core or the timestamps and it's possible
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index 45518b8c613c..162e044c7f56 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -718,6 +718,7 @@ xfs_setattr_nonsize(
> >  	}
> >  
> >  	setattr_copy(mnt_userns, inode, iattr);
> > +	inode_inc_iversion(inode);
> >  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> >  
> >  	XFS_STATS_INC(mp, xs_ig_attrchg);
> > @@ -943,6 +944,7 @@ xfs_setattr_size(
> >  
> >  	ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID)));
> >  	setattr_copy(mnt_userns, inode, iattr);
> > +	inode_inc_iversion(inode);
> >  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> >  
> >  	XFS_STATS_INC(mp, xs_ig_attrchg);
> > @@ -1047,6 +1049,8 @@ xfs_vn_update_time(
> >  		inode->i_mtime = *now;
> >  	if (flags & S_ATIME)
> >  		inode->i_atime = *now;
> > +	if ((flags & S_VERSION) && inode_maybe_inc_iversion(inode, false))
> > +		log_flags |= XFS_ILOG_CORE;
> >  
> >  	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> >  	xfs_trans_log_inode(tp, ip, log_flags);
> > -- 
> > 2.37.2
> > 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux