Re: [PATCH] xfs: avoid freeing multiple extents from same AG in pending transactions

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 18 May 2023 08:49:26 +1000

On Wed, May 17, 2023 at 07:14:32PM +0000, Wengang Wang wrote:
> > On May 16, 2023, at 6:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Tue, May 16, 2023 at 05:59:13PM -0700, Darrick J. Wong wrote:
> > I was thinking this code changes to:
> > 
> > flags |= XFS_ALLOC_FLAG_TRY_FLUSH;
> > ....
> > <attempt allocation>
> > ....
> > if (busy) {
> > xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
> > trace_xfs_alloc_size_busy(args);
> > error = xfs_extent_busy_flush(args->tp, args->pag,
> > busy_gen, flags);
> > if (!error) {
> > flags &= ~XFS_ALLOC_FLAG_TRY_FLUSH;
> 
> What’s the benefits to use XFS_ALLOC_FLAG_TRY_FLUSH?
> If no change happened to pagb_gen, we would get nothing good in the retry
> but waste cycles. Or I missed something?

You missed something: the synchronous log force is always done.

The log force is what allows busy extents to be resolved - busy
extents have to be committed to stable storage before they can be
removed from the busy extent tree.

If online discards are not enabled, busy extents are resolved
directly in journal IO completion - the log force waits for this to
occur. In this case, pag->pagb_gen will have already incremented to
indicate progress has been made, and we should never wait in the
loop after the log force. The only time we do that is when the
current transaction holds busy extents itself, and hence if the
current tx holds busy extents we should not wait beyond the log
force....

If online discards are enabled, then they'll be scheduled by journal
IO completion. i.e. waiting on the log force guarntees pending
discards have been scheduled and they'll start completing soon after
the log force returns. When they complete they'll start incrementing
pag->pagb_gen. This is the case the pag->pagb_gen wait loop exists
for - it waits for a discard to complete and resolve the busy extent
in it's IO compeltion routine. At which point the allocation attempt
can restart.

However, the same caveat about the current tx holding
busy extents still exists - we can't tell the difference between
"discards scheduled but not completed" and "no busy extents to
resolve" in the flush code. Hence regardless of the online discard
feature state, we should not be waiting on busy extent generation
changes if we hold busy extents in the transaction....

IOWs, the TRY_FLUSH code reflects the fact that for most users, the
log force resolves the busy extents, not the wait loop on
pag->pagb_gen changing. The wait loop only really kicks in when
online discard is active, and in that case we really do want to
retry allocation without waiting for (potentially very slow)
discards to complete first. We'll do that "wait for discards" the
second time we fail to find a non-busy extent....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx