Re: [PATCH 03/30] xfs: add an inode item lock

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 4 Jun 2020 11:54:56 +1000

On Tue, Jun 02, 2020 at 12:34:44PM -0400, Brian Foster wrote:
> On Tue, Jun 02, 2020 at 07:42:24AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> ...
> > @@ -122,23 +117,30 @@ xfs_trans_log_inode(
> >  	 * set however, then go ahead and bump the i_version counter
> >  	 * unconditionally.
> >  	 */
> > -	if (!test_and_set_bit(XFS_LI_DIRTY, &ip->i_itemp->ili_item.li_flags) &&
> > -	    IS_I_VERSION(VFS_I(ip))) {
> > -		if (inode_maybe_inc_iversion(VFS_I(ip), flags & XFS_ILOG_CORE))
> > -			flags |= XFS_ILOG_CORE;
> > +	if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) {
> > +		if (IS_I_VERSION(inode) &&
> > +		    inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE))
> > +			iversion_flags = XFS_ILOG_CORE;
> >  	}
> >  
> > -	tp->t_flags |= XFS_TRANS_DIRTY;
> > +	/*
> > +	 * Record the specific change for fdatasync optimisation. This allows
> > +	 * fdatasync to skip log forces for inodes that are only timestamp
> > +	 * dirty. We do this before the change count so that the core being
> > +	 * logged in this case does not impact on fdatasync behaviour.
> > +	 */
> 
> We no longer do this before the change count logic so that part of the
> comment is bogus.

Ugh. Another 6 patch conflicts to resolve coming right up....

> > +	spin_lock(&iip->ili_lock);
> > +	iip->ili_fsync_fields |= flags;
> >  
> >  	/*
> > -	 * Always OR in the bits from the ili_last_fields field.
> > -	 * This is to coordinate with the xfs_iflush() and xfs_iflush_done()
> > -	 * routines in the eventual clearing of the ili_fields bits.
> > -	 * See the big comment in xfs_iflush() for an explanation of
> > -	 * this coordination mechanism.
> > +	 * Always OR in the bits from the ili_last_fields field.  This is to
> > +	 * coordinate with the xfs_iflush() and xfs_iflush_done() routines in
> > +	 * the eventual clearing of the ili_fields bits.  See the big comment in
> > +	 * xfs_iflush() for an explanation of this coordination mechanism.
> >  	 */
> > -	flags |= ip->i_itemp->ili_last_fields;
> > -	ip->i_itemp->ili_fields |= flags;
> > +	iip->ili_fields |= (flags | iip->ili_last_fields |
> > +			    iversion_flags);
> > +	spin_unlock(&iip->ili_lock);
> >  }
> >  
> >  int
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 403c90309a8ff..0abf770b77498 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -94,6 +94,7 @@ xfs_file_fsync(
> >  {
> >  	struct inode		*inode = file->f_mapping->host;
> >  	struct xfs_inode	*ip = XFS_I(inode);
> > +	struct xfs_inode_log_item *iip = ip->i_itemp;
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  	int			error = 0;
> >  	int			log_flushed = 0;
> > @@ -137,13 +138,15 @@ xfs_file_fsync(
> >  	xfs_ilock(ip, XFS_ILOCK_SHARED);
> >  	if (xfs_ipincount(ip)) {
> >  		if (!datasync ||
> > -		    (ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
> > -			lsn = ip->i_itemp->ili_last_lsn;
> > +		    (iip->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
> > +			lsn = iip->ili_last_lsn;
> 
> I am still a little confused why the lock is elided in other read cases,
> such as this one or perhaps the similar check in xfs_bmbt_to_iomap()..?

They are still all serialised against those field changing the same
way they currently are. i.e. they are all under the ILOCK, so
changes that occur during IO submission will never occur.  Hence the
only thing that we can race with is IO completion clearing the
fields, in which case the subsequent operations if the item is now
clean turn into no-ops.

i.e:
- ILOCK serialises transaction logging vs IO submission.
- iflock serialises IO submission vs IO completion.
- Nothing serialises transaction logging vs IO completion.

The latter is what the ili_lock is intended for; everything else is
still protected by the existing serialisation mechanisms that they
are now. Any races in areas outside xfs_trans_log_inode() vs
xfs_iflush_done/abort() is largely outside the scope of this patch
and this lock...

> Similarly, it looks like we set the ili_[flush|last]_lsn fields outside
> of this lock (though last_lsn looks like it's also covered by ilock),
> yet the update to the inode_log_item struct implies they should be
> protected. What's the intent there?

The lsn fields are updated via xfs_trans_ail_lsn_copy(), which on 32
bit systems takes the AIL lock, and I don't think it's a good idea
to put the AIL lock inside the inode item lock.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx