On Thu, Jun 15, 2023 at 01:57:22PM +1000, Dave Chinner wrote: > On Wed, Jun 14, 2023 at 08:38:37PM -0700, Darrick J. Wong wrote: > > On Thu, Jun 15, 2023 at 11:42:00AM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > Extent freeing neeeds to be able to avoid a busy extent deadlock > > > when the transaction itself holds the only busy extents in the > > > allocation group. This may occur if we have an EFI that contains > > > multiple extents to be freed, and the freeing the second intent > > > requires the space the first extent free released to expand the > > > AGFL. If we block on the busy extent at this point, we deadlock. > > > > > > We hold a dirty transaction that contains a entire atomic extent > > > free operations within it, so if we can abort the extent free > > > operation and commit the progress that we've made, the busy extent > > > can be resolved by a log force. Hence we can restart the aborted > > > extent free with a new transaction and continue to make > > > progress without risking deadlocks. > > > > > > To enable this, we need the EFI processing code to be able to handle > > > an -EAGAIN error to tell it to commit the current transaction and > > > retry again. This mechanism is already built into the defer ops > > > processing (used bythe refcount btree modification intents), so > > > there's relatively little handling we need to add to the EFI code to > > > enable this. > > > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > --- > > > fs/xfs/xfs_extfree_item.c | 64 +++++++++++++++++++++++++++++++++++++-- > > > 1 file changed, 61 insertions(+), 3 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c > > > index f9e36b810663..3b33d27efdce 100644 > > > --- a/fs/xfs/xfs_extfree_item.c > > > +++ b/fs/xfs/xfs_extfree_item.c > > > @@ -336,6 +336,29 @@ xfs_trans_get_efd( > > > return efdp; > > > } > > > > > > +/* > > > + * Fill the EFD with all extents from the EFI when we need to roll the > > > + * transaction and continue with a new EFI. > > > + */ > > > +static void > > > +xfs_efd_from_efi( > > > + struct xfs_efd_log_item *efdp) > > > +{ > > > + struct xfs_efi_log_item *efip = efdp->efd_efip; > > > + uint i; > > > + > > > + ASSERT(efip->efi_format.efi_nextents > 0); > > > + > > > + if (efdp->efd_next_extent == efip->efi_format.efi_nextents) > > > + return; > > > + > > > + for (i = 0; i < efip->efi_format.efi_nextents; i++) { > > > + efdp->efd_format.efd_extents[i] = > > > + efip->efi_format.efi_extents[i]; > > > + } > > > + efdp->efd_next_extent = efip->efi_format.efi_nextents; > > > > Odd question -- if we managed to free half the extents mentioned in an > > EFI before hitting -EAGAIN, then efdp->efd_next_extent should already be > > half of efip->efi_format.efi_nextents, right? > > Yes, on success we normally update the EFD with the extent we just > completed and move the EFI/EFD cursors forwards. > > > I suppose it's duplicative (or maybe just careful) to recopy the entire > > thing... but maybe that doesn't even really matter since no modern xlog > > code actually pays attention to what's in the EFD aside from the ID > > number. > > *nod* > > On second thoughts, now that you've questioned this behaviour, I > need to go back and check my assumptions about what the intent > creation is doing vs the current EFI vs the XEFI we are processing. > The new EFI we log shouldn't have the extents we've completed in it, > just the ones we haven't run, and I need to make sure that is > actually what is happening here. That shouldn't be happening -- each of the xfs_free_extent_later calls below adds new incore EFIs to an xfs_defer_pending.dfp_work list and each xfs_defer_pending itself gets added to xfs_trans.t_dfops. The defer_capture_and_commit function will turn the xfs_defer_pending into a new EFI log item with the queued dfp_work items attached. IOWs, as long as you don't call xfs_free_extent_later on any of the xefi_startblock/blockcount pairs where xfs_trans_free_extent returned 0, your assumptions are correct. The code presented in this patch is correct. --D > > > @@ -652,9 +694,25 @@ xfs_efi_item_recover( > > > fake.xefi_startblock = extp->ext_start; > > > fake.xefi_blockcount = extp->ext_len; > > > > > > - xfs_extent_free_get_group(mp, &fake); > > > - error = xfs_trans_free_extent(tp, efdp, &fake); > > > - xfs_extent_free_put_group(&fake); > > > + if (!requeue_only) { > > > + xfs_extent_free_get_group(mp, &fake); > > > + error = xfs_trans_free_extent(tp, efdp, &fake); > > > + xfs_extent_free_put_group(&fake); > > > + } > > > + > > > + /* > > > + * If we can't free the extent without potentially deadlocking, > > > + * requeue the rest of the extents to a new so that they get > > > + * run again later with a new transaction context. > > > + */ > > > + if (error == -EAGAIN || requeue_only) { > > > + xfs_free_extent_later(tp, fake.xefi_startblock, > > > + fake.xefi_blockcount, &XFS_RMAP_OINFO_ANY_OWNER); > > > > Shouldn't we check the return value of xfs_free_extent_later now? > > I think we already did that above, but since you just plumbed in the > > extra checks, we ought to use it. :) > > Oh, right, my cscope tree needs updating, so I was thinking it is > still a void function. > > > (Also the indenting here isn't the usual two tabs) > > I'll fix that too. > > Cheers, > > Dave. > > -- > Dave Chinner > david@xxxxxxxxxxxxx