Re: [PATCH 29/30] xfs: factor xfs_iflush_done

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 11 Jun 2020 10:07:09 -0400

On Thu, Jun 11, 2020 at 10:16:22AM +1000, Dave Chinner wrote:
> On Wed, Jun 10, 2020 at 09:08:33AM -0400, Brian Foster wrote:
> > On Wed, Jun 10, 2020 at 08:14:31AM +1000, Dave Chinner wrote:
> > > On Tue, Jun 09, 2020 at 09:12:49AM -0400, Brian Foster wrote:
> > > > On Thu, Jun 04, 2020 at 05:46:05PM +1000, Dave Chinner wrote:
...
> > 
> > I'm referring to the fact that we no longer check the lsn of each
> > (flushed) log item attached to the buffer under the ail lock.
> 
> That whole loop in xfs_iflush_ail_updates() runs under the AIL
> lock, so it does the right thing for anything that is moved to the
> "ail_updates" list.
> 
> If we win the unlocked race (li_lsn does not change) then we move
> the inode to the ail update list and it gets rechecked under the AIL
> lock and does the right thing. If we lose the race (li_lsn changes)
> then the inode has been redirtied and we *don't need to check it
> under the AIL* - all we need to do is leave it attached to the
> buffer.
> 
> This is the same as the old code: win the race, need_ail is
> incremented and we recheck under the AIL lock. Lose the race and
> we don't recheck under the AIL because we don't need to. This
> happened less under the old code, because it typically only happened
> with single dirty inodes on a cluster buffer (think directory inode
> under long running large directory modification operations), but
> that race most definitely existed and the code most definitely
> handled it correctly.
> 
> Keep in mind that this inode redirtying/AIL repositioning race can
> even occur /after/ we've locked and removed items from the AIL but
> before we've run xfs_iflush_finish(). i.e. we can remove it from the
> AIL but by the time xfs_iflush_finish() runs it's back in the AIL.
> 

All of the above would make a nice commit log for an independent patch.
;) Note again that I wasn't suggesting the logic was incorrect...

> > Note that
> > I am not saying it's necessarily wrong, but rather that IMO it's too
> > subtle a change to silently squash into a refactoring patch.
> 
> Except it isn't a change at all. The same subtle issue exists in the
> code before this patch. It's just that this refactoring makes subtle
> race conditions that were previously unknown to reviewers so much
> more obvious they can now see them clearly. That tells me the code
> is much improved by this refactoring, not that there's a problem
> that needs reworking....
> 

This patch elevates a bit of code from effectively being an (ail) lock
avoidance optimization to essentially per-item filtering logic without
any explanation beyond facilitating future modifications. Independent of
whether it's correct, this is not purely a refactoring change IMO.

> > > FWIW, I untangled the function this way because the "track dirty
> > > inodes by ordered buffers" patchset completely removes the AIL stuff
> > > - the ail_updates list and the xfs_iflush_ail_updates() function go
> > > away completely and the rest of the refactoring remains unchanged.
> > > i.e.  as the commit messages says, this change makes follow-on
> > > patches much easier to understand...
> > > 
> > 
> > The general function breakdown seems fine to me. I find the multiple
> > list processing to be a bit overdone, particularly if it doesn't serve a
> > current functional purpose. If the purpose is to support a future patch
> > series, I'd suggest to continue using the existing logic of moving all
> > flushed inodes to a single list and leave the separate list bits to the
> > start of the series where it's useful so it's possible to review with
> > the associated context (or alternatively just defer the entire patch).
> 
> That's how I originally did it, and it was a mess. it didn't
> separate cleanly at all, and didn't make future patches much easier
> at all. Hence I don't think reworking the patch just to look
> different gains us anything at this point...
> 

I find that hard to believe. This patch splits the buffer list into two
lists, processes the first one, immediately combines it with the second,
then processes the second which is no different from the single list
that was constructed by the original code. The only reasons I can see
for this kind of churn is either to address some kind of performance or
efficiency issue or if the lists are used for further changes. The
former is not a documented reason and there's no context for the latter
because it's apparently part of some future series.

TBH, I think this patch should probably be broken down into two or three
independent patches anyways. What's the issue with something like the
appended diff (on top of this patch) in the meantime? If the multiple
list logic is truly necessary, reintroduce it when it's used so it's
actually reviewable..

Brian

--- 8< ---

diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 3894d190ea5b..83580e204560 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -718,8 +718,8 @@ xfs_iflush_done(
 	struct xfs_buf		*bp)
 {
 	struct xfs_log_item	*lip, *n;
-	LIST_HEAD(flushed_inodes);
-	LIST_HEAD(ail_updates);
+	int			need_ail = 0;
+	LIST_HEAD(tmp);
 
 	/*
 	 * Pull the attached inodes from the buffer one at a time and take the
@@ -732,25 +732,24 @@ xfs_iflush_done(
 			xfs_iflush_abort(iip->ili_inode);
 			continue;
 		}
+
 		if (!iip->ili_last_fields)
 			continue;
 
-		/* Do an unlocked check for needing the AIL lock. */
+		list_move_tail(&lip->li_bio_list, &tmp);
+
+		/* Do an unlocked check for needing AIL processing */
 		if (iip->ili_flush_lsn == lip->li_lsn ||
 		    test_bit(XFS_LI_FAILED, &lip->li_flags))
-			list_move_tail(&lip->li_bio_list, &ail_updates);
-		else
-			list_move_tail(&lip->li_bio_list, &flushed_inodes);
+			need_ail++;
 	}
 
-	if (!list_empty(&ail_updates)) {
-		xfs_iflush_ail_updates(bp->b_mount->m_ail, &ail_updates);
-		list_splice_tail(&ail_updates, &flushed_inodes);
-	}
+	if (need_ail)
+		xfs_iflush_ail_updates(bp->b_mount->m_ail, &tmp);
 
-	xfs_iflush_finish(bp, &flushed_inodes);
-	if (!list_empty(&flushed_inodes))
-		list_splice_tail(&flushed_inodes, &bp->b_li_list);
+	xfs_iflush_finish(bp, &tmp);
+	if (!list_empty(&tmp))
+		list_splice_tail(&tmp, &bp->b_li_list);
 }
 
 /*