Re: [PATCH 3/6] xfs: log EFIs for all btree blocks being used to stage a btree

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 8 Aug 2023 16:11:13 +1000

On Mon, Aug 07, 2023 at 05:54:52PM -0700, Darrick J. Wong wrote:
> On Mon, Aug 07, 2023 at 06:41:39PM +1000, Dave Chinner wrote:
> > On Thu, Jul 27, 2023 at 03:24:32PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > 
> > > We need to log EFIs for every extent that we allocate for the purpose of
> > > staging a new btree so that if we fail then the blocks will be freed
> > > during log recovery.  Add a function to relog the EFIs, so that repair
> > > can relog them all every time it creates a new btree block, which will
> > > help us to avoid pinning the log tail.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > .....
> > > +/*
> > > + * Set up automatic reaping of the blocks reserved for btree reconstruction in
> > > + * case we crash by logging a deferred free item for each extent we allocate so
> > > + * that we can get all of the space back if we crash before we can commit the
> > > + * new btree.  This function returns a token that can be used to cancel
> > > + * automatic reaping if repair is successful.
> > > + */
> > > +static int
> > > +xrep_newbt_schedule_autoreap(
> > > +	struct xrep_newbt		*xnr,
> > > +	struct xrep_newbt_resv		*resv)
> > > +{
> > > +	struct xfs_extent_free_item	efi_item = {
> > > +		.xefi_blockcount	= resv->len,
> > > +		.xefi_owner		= xnr->oinfo.oi_owner,
> > > +		.xefi_flags		= XFS_EFI_SKIP_DISCARD,
> > > +		.xefi_pag		= resv->pag,
> > > +	};
> > > +	struct xfs_scrub		*sc = xnr->sc;
> > > +	struct xfs_log_item		*lip;
> > > +	LIST_HEAD(items);
> > > +
> > > +	ASSERT(xnr->oinfo.oi_offset == 0);
> > > +
> > > +	efi_item.xefi_startblock = XFS_AGB_TO_FSB(sc->mp, resv->pag->pag_agno,
> > > +			resv->agbno);
> > > +	if (xnr->oinfo.oi_flags & XFS_OWNER_INFO_ATTR_FORK)
> > > +		efi_item.xefi_flags |= XFS_EFI_ATTR_FORK;
> > > +	if (xnr->oinfo.oi_flags & XFS_OWNER_INFO_BMBT_BLOCK)
> > > +		efi_item.xefi_flags |= XFS_EFI_BMBT_BLOCK;
> > > +
> > > +	INIT_LIST_HEAD(&efi_item.xefi_list);
> > > +	list_add(&efi_item.xefi_list, &items);
> > > +
> > > +	xfs_perag_intent_hold(resv->pag);
> > > +	lip = xfs_extent_free_defer_type.create_intent(sc->tp, &items, 1,
> > > +			false);
> > 
> > Hmmmm.
> > 
> > That triggered flashing lights and sirens - I'm not sure I really
> > like the usage of the defer type arrays like this, nor the
> > duplication of the defer mechanisms for relogging, etc.
> 
> Yeah, I don't quite like manually tromping through the defer ops state
> machine here either.  Everywhere /else/ in XFS logs an EFI and finishes
> it to free the space.  Just to make sure we're on the same page, newbt
> will allocate space, log an EFI, and then:
> 
> 1. Use the space and log an EFD for the space to cancel the EFI
> 2. Use some of the space, log an EFD for the space we used, immediately
>    log a new EFI for the unused parts, and finish the new EFI manually
> 3. Don't use any of the space at all, and finish the EFI manually
> 
> Initially, I tried using the regular defer ops mechanism, but this got
> messy on account of having to extern most of xfs_defer.c so that I could
> manually modify the defer ops state.  It's hard to generalize this,
> since there's only *one* place that actually needs manual flow control.

*nod*

But I can't help but think it's a manifestation of a generic
optimisation that could allow us to avoid needing to use unwritten
extents for new data alloations...

> ISTR that was around the time bfoster and I were reworking log intent
> item recovery, and it was easier to do this outside of the defer ops
> code than try to refactor it and keep this exceptional piece working
> too.
> 
> > Not that I have a better idea right now - is this the final form of
> > this code, or is more stuff built on top of it or around it?
> 
> That's the final form of it.  The good news is that it's been stable
> enough despite me tearing into the EFI code again in the rt
> modernization patchset.  Do you have any further suggestions?

Not for the patchset as it stands.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx