On Thu, Sep 17, 2020 at 04:11:48PM +1000, Dave Chinner wrote: > On Wed, Sep 16, 2020 at 08:30:00PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > There's a subtle design flaw in the deferred log item code that can lead > > to pinning the log tail. Taking up the defer ops chain examples from > > the previous commit, we can get trapped in sequences like this: > > > > Caller hands us a transaction t0 with D0-D3 attached. The defer ops > > chain will look like the following if the transaction rolls succeed: > > > > t1: D0(t0), D1(t0), D2(t0), D3(t0) > > t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0) > > t3: d5(t1), D1(t0), D2(t0), D3(t0) > > ... > > t9: d9(t7), D3(t0) > > t10: D3(t0) > > t11: d10(t10), d11(t10) > > t12: d11(t10) > > > > In transaction 9, we finish d9 and try to roll to t10 while holding onto > > an intent item for D3 that we logged in t0. > > > > The previous commit changed the order in which we place new defer ops in > > the defer ops processing chain to reduce the maximum chain length. Now > > make xfs_defer_finish_noroll capable of relogging the entire chain > > periodically so that we can always move the log tail forward. We do > > this every seven loops, having observed that while most chains never > > exceed seven items in length, the rest go far over that and seem to > > be involved in most of the stall problems. > > > > Callers are now required to ensure that the transaction reservation is > > large enough to handle logging done items and new intent items for the > > maximum possible chain length. Most callers are careful to keep the > > chain lengths low, so the overhead should be minimal. > > > > (Note that in the next patch we'll make it so that we only relog on > > demand, since 7 is an arbitrary number that I used here to get the basic > > mechanics working.) > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > --- > > fs/xfs/libxfs/xfs_defer.c | 30 ++++++++++++++++++++++++++++++ > > fs/xfs/xfs_bmap_item.c | 25 +++++++++++++++++++++++++ > > fs/xfs/xfs_extfree_item.c | 29 +++++++++++++++++++++++++++++ > > fs/xfs/xfs_refcount_item.c | 27 +++++++++++++++++++++++++++ > > fs/xfs/xfs_rmap_item.c | 27 +++++++++++++++++++++++++++ > > fs/xfs/xfs_trace.h | 1 + > > fs/xfs/xfs_trans.h | 10 ++++++++++ > > 7 files changed, 149 insertions(+) > > > > > > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c > > index 84a70edd0da1..7938e4d3af90 100644 > > --- a/fs/xfs/libxfs/xfs_defer.c > > +++ b/fs/xfs/libxfs/xfs_defer.c > > @@ -361,6 +361,28 @@ xfs_defer_cancel_list( > > } > > } > > > > +/* > > + * Prevent a log intent item from pinning the tail of the log by logging a > > + * done item to release the intent item; and then log a new intent item. > > + * The caller should provide a fresh transaction and roll it after we're done. > > + */ > > +static int > > +xfs_defer_relog( > > + struct xfs_trans **tpp, > > + struct list_head *dfops) > > +{ > > + struct xfs_defer_pending *dfp; > > + > > + ASSERT((*tpp)->t_flags & XFS_TRANS_PERM_LOG_RES); > > + > > + list_for_each_entry(dfp, dfops, dfp_list) { > > + trace_xfs_defer_relog_intent((*tpp)->t_mountp, dfp); > > + dfp->dfp_intent = xfs_trans_item_relog(dfp->dfp_intent, *tpp); > > Any reason for xfs_trans_item_relog() when it's a one liner? There aren't log intent items in userspace, so xfs_trans_item_relog becomes a NOP macro in the xfsprogs port. > > + } > > + > > + return xfs_defer_trans_roll(tpp); > > +} > > + > > /* > > * Log an intent-done item for the first pending intent, and finish the work > > * items. > > @@ -422,6 +444,7 @@ xfs_defer_finish_noroll( > > struct xfs_trans **tp) > > { > > struct xfs_defer_pending *dfp; > > + unsigned int nr_rolls = 0; > > int error = 0; > > LIST_HEAD(dop_pending); > > > > @@ -447,6 +470,13 @@ xfs_defer_finish_noroll( > > if (error) > > goto out_shutdown; > > > > + /* Every few rolls we relog all the intent items. */ > > + if (!(++nr_rolls % 7)) { > > + error = xfs_defer_relog(tp, &dop_pending); > > + if (error) > > + goto out_shutdown; > > + } > > Urk. > > I think I've got a better idea: rather than a counter, use something > meaningful as to whether the intent has been committed or not. e.g. > use something like xfs_log_item_in_current_chkpt() to determine if > we need to relog the intent. I'll take a look at that in the morning. > i.e. If the intent is active in the CIL, then we don't need to relog > it. If the intent has been committed to the journal and is no longer > in the CIL list, relog it so the next CIL push will move it forward > in the journal. > > The intent relogging functions look fine, though. Thanks for digging through some of these. :) --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx