On Mon, Aug 07, 2023 at 06:41:39PM +1000, Dave Chinner wrote: > On Thu, Jul 27, 2023 at 03:24:32PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > We need to log EFIs for every extent that we allocate for the purpose of > > staging a new btree so that if we fail then the blocks will be freed > > during log recovery. Add a function to relog the EFIs, so that repair > > can relog them all every time it creates a new btree block, which will > > help us to avoid pinning the log tail. > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > ..... > > +/* > > + * Set up automatic reaping of the blocks reserved for btree reconstruction in > > + * case we crash by logging a deferred free item for each extent we allocate so > > + * that we can get all of the space back if we crash before we can commit the > > + * new btree. This function returns a token that can be used to cancel > > + * automatic reaping if repair is successful. > > + */ > > +static int > > +xrep_newbt_schedule_autoreap( > > + struct xrep_newbt *xnr, > > + struct xrep_newbt_resv *resv) > > +{ > > + struct xfs_extent_free_item efi_item = { > > + .xefi_blockcount = resv->len, > > + .xefi_owner = xnr->oinfo.oi_owner, > > + .xefi_flags = XFS_EFI_SKIP_DISCARD, > > + .xefi_pag = resv->pag, > > + }; > > + struct xfs_scrub *sc = xnr->sc; > > + struct xfs_log_item *lip; > > + LIST_HEAD(items); > > + > > + ASSERT(xnr->oinfo.oi_offset == 0); > > + > > + efi_item.xefi_startblock = XFS_AGB_TO_FSB(sc->mp, resv->pag->pag_agno, > > + resv->agbno); > > + if (xnr->oinfo.oi_flags & XFS_OWNER_INFO_ATTR_FORK) > > + efi_item.xefi_flags |= XFS_EFI_ATTR_FORK; > > + if (xnr->oinfo.oi_flags & XFS_OWNER_INFO_BMBT_BLOCK) > > + efi_item.xefi_flags |= XFS_EFI_BMBT_BLOCK; > > + > > + INIT_LIST_HEAD(&efi_item.xefi_list); > > + list_add(&efi_item.xefi_list, &items); > > + > > + xfs_perag_intent_hold(resv->pag); > > + lip = xfs_extent_free_defer_type.create_intent(sc->tp, &items, 1, > > + false); > > Hmmmm. > > That triggered flashing lights and sirens - I'm not sure I really > like the usage of the defer type arrays like this, nor the > duplication of the defer mechanisms for relogging, etc. Yeah, I don't quite like manually tromping through the defer ops state machine here either. Everywhere /else/ in XFS logs an EFI and finishes it to free the space. Just to make sure we're on the same page, newbt will allocate space, log an EFI, and then: 1. Use the space and log an EFD for the space to cancel the EFI 2. Use some of the space, log an EFD for the space we used, immediately log a new EFI for the unused parts, and finish the new EFI manually 3. Don't use any of the space at all, and finish the EFI manually Initially, I tried using the regular defer ops mechanism, but this got messy on account of having to extern most of xfs_defer.c so that I could manually modify the defer ops state. It's hard to generalize this, since there's only *one* place that actually needs manual flow control. ISTR that was around the time bfoster and I were reworking log intent item recovery, and it was easier to do this outside of the defer ops code than try to refactor it and keep this exceptional piece working too. > Not that I have a better idea right now - is this the final form of > this code, or is more stuff built on top of it or around it? That's the final form of it. The good news is that it's been stable enough despite me tearing into the EFI code again in the rt modernization patchset. Do you have any further suggestions? --D > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx