On Mon, Aug 07, 2023 at 05:54:52PM -0700, Darrick J. Wong wrote: > On Mon, Aug 07, 2023 at 06:41:39PM +1000, Dave Chinner wrote: > > On Thu, Jul 27, 2023 at 03:24:32PM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > We need to log EFIs for every extent that we allocate for the purpose of > > > staging a new btree so that if we fail then the blocks will be freed > > > during log recovery. Add a function to relog the EFIs, so that repair > > > can relog them all every time it creates a new btree block, which will > > > help us to avoid pinning the log tail. > > > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > ..... > > > +/* > > > + * Set up automatic reaping of the blocks reserved for btree reconstruction in > > > + * case we crash by logging a deferred free item for each extent we allocate so > > > + * that we can get all of the space back if we crash before we can commit the > > > + * new btree. This function returns a token that can be used to cancel > > > + * automatic reaping if repair is successful. > > > + */ > > > +static int > > > +xrep_newbt_schedule_autoreap( > > > + struct xrep_newbt *xnr, > > > + struct xrep_newbt_resv *resv) > > > +{ > > > + struct xfs_extent_free_item efi_item = { > > > + .xefi_blockcount = resv->len, > > > + .xefi_owner = xnr->oinfo.oi_owner, > > > + .xefi_flags = XFS_EFI_SKIP_DISCARD, > > > + .xefi_pag = resv->pag, > > > + }; > > > + struct xfs_scrub *sc = xnr->sc; > > > + struct xfs_log_item *lip; > > > + LIST_HEAD(items); > > > + > > > + ASSERT(xnr->oinfo.oi_offset == 0); > > > + > > > + efi_item.xefi_startblock = XFS_AGB_TO_FSB(sc->mp, resv->pag->pag_agno, > > > + resv->agbno); > > > + if (xnr->oinfo.oi_flags & XFS_OWNER_INFO_ATTR_FORK) > > > + efi_item.xefi_flags |= XFS_EFI_ATTR_FORK; > > > + if (xnr->oinfo.oi_flags & XFS_OWNER_INFO_BMBT_BLOCK) > > > + efi_item.xefi_flags |= XFS_EFI_BMBT_BLOCK; > > > + > > > + INIT_LIST_HEAD(&efi_item.xefi_list); > > > + list_add(&efi_item.xefi_list, &items); > > > + > > > + xfs_perag_intent_hold(resv->pag); > > > + lip = xfs_extent_free_defer_type.create_intent(sc->tp, &items, 1, > > > + false); > > > > Hmmmm. > > > > That triggered flashing lights and sirens - I'm not sure I really > > like the usage of the defer type arrays like this, nor the > > duplication of the defer mechanisms for relogging, etc. > > Yeah, I don't quite like manually tromping through the defer ops state > machine here either. Everywhere /else/ in XFS logs an EFI and finishes > it to free the space. Just to make sure we're on the same page, newbt > will allocate space, log an EFI, and then: > > 1. Use the space and log an EFD for the space to cancel the EFI > 2. Use some of the space, log an EFD for the space we used, immediately > log a new EFI for the unused parts, and finish the new EFI manually > 3. Don't use any of the space at all, and finish the EFI manually > > Initially, I tried using the regular defer ops mechanism, but this got > messy on account of having to extern most of xfs_defer.c so that I could > manually modify the defer ops state. It's hard to generalize this, > since there's only *one* place that actually needs manual flow control. *nod* But I can't help but think it's a manifestation of a generic optimisation that could allow us to avoid needing to use unwritten extents for new data alloations... > ISTR that was around the time bfoster and I were reworking log intent > item recovery, and it was easier to do this outside of the defer ops > code than try to refactor it and keep this exceptional piece working > too. > > > Not that I have a better idea right now - is this the final form of > > this code, or is more stuff built on top of it or around it? > > That's the final form of it. The good news is that it's been stable > enough despite me tearing into the EFI code again in the rt > modernization patchset. Do you have any further suggestions? Not for the patchset as it stands. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx