Re: [PATCH 2/3] xfs: allow extent free intents to be retried

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Thu, 15 Jun 2023 07:41:50 -0700

On Thu, Jun 15, 2023 at 01:57:22PM +1000, Dave Chinner wrote:
> On Wed, Jun 14, 2023 at 08:38:37PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 15, 2023 at 11:42:00AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > 
> > > Extent freeing neeeds to be able to avoid a busy extent deadlock
> > > when the transaction itself holds the only busy extents in the
> > > allocation group. This may occur if we have an EFI that contains
> > > multiple extents to be freed, and the freeing the second intent
> > > requires the space the first extent free released to expand the
> > > AGFL. If we block on the busy extent at this point, we deadlock.
> > > 
> > > We hold a dirty transaction that contains a entire atomic extent
> > > free operations within it, so if we can abort the extent free
> > > operation and commit the progress that we've made, the busy extent
> > > can be resolved by a log force. Hence we can restart the aborted
> > > extent free with a new transaction and continue to make
> > > progress without risking deadlocks.
> > > 
> > > To enable this, we need the EFI processing code to be able to handle
> > > an -EAGAIN error to tell it to commit the current transaction and
> > > retry again. This mechanism is already built into the defer ops
> > > processing (used bythe refcount btree modification intents), so
> > > there's relatively little handling we need to add to the EFI code to
> > > enable this.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/xfs_extfree_item.c | 64 +++++++++++++++++++++++++++++++++++++--
> > >  1 file changed, 61 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> > > index f9e36b810663..3b33d27efdce 100644
> > > --- a/fs/xfs/xfs_extfree_item.c
> > > +++ b/fs/xfs/xfs_extfree_item.c
> > > @@ -336,6 +336,29 @@ xfs_trans_get_efd(
> > >  	return efdp;
> > >  }
> > >  
> > > +/*
> > > + * Fill the EFD with all extents from the EFI when we need to roll the
> > > + * transaction and continue with a new EFI.
> > > + */
> > > +static void
> > > +xfs_efd_from_efi(
> > > +	struct xfs_efd_log_item	*efdp)
> > > +{
> > > +	struct xfs_efi_log_item *efip = efdp->efd_efip;
> > > +	uint                    i;
> > > +
> > > +	ASSERT(efip->efi_format.efi_nextents > 0);
> > > +
> > > +	if (efdp->efd_next_extent == efip->efi_format.efi_nextents)
> > > +		return;
> > > +
> > > +	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
> > > +	       efdp->efd_format.efd_extents[i] =
> > > +		       efip->efi_format.efi_extents[i];
> > > +	}
> > > +	efdp->efd_next_extent = efip->efi_format.efi_nextents;
> > 
> > Odd question -- if we managed to free half the extents mentioned in an
> > EFI before hitting -EAGAIN, then efdp->efd_next_extent should already be
> > half of efip->efi_format.efi_nextents, right?
> 
> Yes, on success we normally update the EFD with the extent we just
> completed and move the EFI/EFD cursors forwards.
> 
> > I suppose it's duplicative (or maybe just careful) to recopy the entire
> > thing... but maybe that doesn't even really matter since no modern xlog
> > code actually pays attention to what's in the EFD aside from the ID
> > number.
> 
> *nod*
> 
> On second thoughts, now that you've questioned this behaviour, I
> need to go back and check my assumptions about what the intent
> creation is doing vs the current EFI vs the XEFI we are processing.
> The new EFI we log shouldn't have the extents we've completed in it,
> just the ones we haven't run, and I need to make sure that is
> actually what is happening here.

That shouldn't be happening -- each of the xfs_free_extent_later calls
below adds new incore EFIs to an xfs_defer_pending.dfp_work list and
each xfs_defer_pending itself gets added to xfs_trans.t_dfops.  The
defer_capture_and_commit function will turn the xfs_defer_pending into a
new EFI log item with the queued dfp_work items attached.

IOWs, as long as you don't call xfs_free_extent_later on any of the
xefi_startblock/blockcount pairs where xfs_trans_free_extent returned 0,
your assumptions are correct.

The code presented in this patch is correct.

--D

> > > @@ -652,9 +694,25 @@ xfs_efi_item_recover(
> > >  		fake.xefi_startblock = extp->ext_start;
> > >  		fake.xefi_blockcount = extp->ext_len;
> > >  
> > > -		xfs_extent_free_get_group(mp, &fake);
> > > -		error = xfs_trans_free_extent(tp, efdp, &fake);
> > > -		xfs_extent_free_put_group(&fake);
> > > +		if (!requeue_only) {
> > > +			xfs_extent_free_get_group(mp, &fake);
> > > +			error = xfs_trans_free_extent(tp, efdp, &fake);
> > > +			xfs_extent_free_put_group(&fake);
> > > +		}
> > > +
> > > +		/*
> > > +		 * If we can't free the extent without potentially deadlocking,
> > > +		 * requeue the rest of the extents to a new so that they get
> > > +		 * run again later with a new transaction context.
> > > +		 */
> > > +		if (error == -EAGAIN || requeue_only) {
> > > +			xfs_free_extent_later(tp, fake.xefi_startblock,
> > > +				fake.xefi_blockcount, &XFS_RMAP_OINFO_ANY_OWNER);
> > 
> > Shouldn't we check the return value of xfs_free_extent_later now?
> > I think we already did that above, but since you just plumbed in the
> > extra checks, we ought to use it. :)
> 
> Oh, right, my cscope tree needs updating, so I was thinking it is
> still a void function.
> 
> > (Also the indenting here isn't the usual two tabs)
> 
> I'll fix that too.
> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx