On Thu, Jul 16, 2020 at 06:49:41AM -0400, Brian Foster wrote: > xfsaild is racy with respect to transaction abort and shutdown in > that the task can idle or exit with an empty AIL but buffers still > on the delwri queue. This was partly addressed by cancelling the > delwri queue before the task exits to prevent memory leaks, but it's > also possible for xfsaild to empty and idle with buffers on the > delwri queue. For example, a transaction that pins a buffer that > also happens to sit on the AIL delwri queue will explicitly remove > the associated log item from the AIL if the transaction aborts. The > side effect of this is an unmount hang in xfs_wait_buftarg() as the > associated buffers remain held by the delwri queue indefinitely. > This is reproduced on repeated runs of generic/531 with an fs format > (-mrmapbt=1 -bsize=1k) that happens to also reproduce transaction > aborts. > > Update xfsaild to not idle until both the AIL and associated delwri > queue are empty and update the push code to continue delwri queue > submission attempts even when the AIL is empty. This allows the AIL > to eventually release aborted buffers stranded on the delwri queue > when they are unlocked by the associated transaction. This should > have no significant effect on normal runtime behavior because the > xfsaild currently idles only when the AIL is empty and in practice > the AIL is rarely empty with a populated delwri queue. The items > must be AIL resident to land in the queue in the first place and > generally aren't removed until writeback completes. > > Note that the pre-existing delwri queue cancel logic in the exit > path is retained because task stop is external, could technically > come at any point, and xfsaild is still responsible to release its > buffer references before it exits. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > Reviewed-by: Christoph Hellwig <hch@xxxxxx> Looks fine to me... Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> --D > --- > > v2: > - Move the out_done label up a bit. > v1: https://lore.kernel.org/linux-xfs/20200715123835.8690-1-bfoster@xxxxxxxxxx/ > > fs/xfs/xfs_trans_ail.c | 16 ++++++---------- > 1 file changed, 6 insertions(+), 10 deletions(-) > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > index c3be6e440134..0c783d339675 100644 > --- a/fs/xfs/xfs_trans_ail.c > +++ b/fs/xfs/xfs_trans_ail.c > @@ -448,16 +448,10 @@ xfsaild_push( > target = ailp->ail_target; > ailp->ail_target_prev = target; > > + /* we're done if the AIL is empty or our push has reached the end */ > lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn); > - if (!lip) { > - /* > - * If the AIL is empty or our push has reached the end we are > - * done now. > - */ > - xfs_trans_ail_cursor_done(&cur); > - spin_unlock(&ailp->ail_lock); > + if (!lip) > goto out_done; > - } > > XFS_STATS_INC(mp, xs_push_ail); > > @@ -539,6 +533,8 @@ xfsaild_push( > break; > lsn = lip->li_lsn; > } > + > +out_done: > xfs_trans_ail_cursor_done(&cur); > spin_unlock(&ailp->ail_lock); > > @@ -546,7 +542,6 @@ xfsaild_push( > ailp->ail_log_flush++; > > if (!count || XFS_LSN_CMP(lsn, target) >= 0) { > -out_done: > /* > * We reached the target or the AIL is empty, so wait a bit > * longer for I/O to complete and remove pushed items from the > @@ -638,7 +633,8 @@ xfsaild( > */ > smp_rmb(); > if (!xfs_ail_min(ailp) && > - ailp->ail_target == ailp->ail_target_prev) { > + ailp->ail_target == ailp->ail_target_prev && > + list_empty(&ailp->ail_buf_list)) { > spin_unlock(&ailp->ail_lock); > freezable_schedule(); > tout = 0; > -- > 2.21.3 >