On Thu, Oct 05, 2023 at 10:12:51PM -0700, Darrick J. Wong wrote: > On Thu, Oct 05, 2023 at 02:47:50PM +1100, Dave Chinner wrote: > > On Tue, Sep 26, 2023 at 04:32:09PM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > As mentioned in the previous commit, online repair wants to allocate > > > space to write out a new metadata structure, and it also wants to hedge > > > against system crashes during repairs by logging (and later cancelling) > > > EFIs to free the space if we crash before committing the new data > > > structure. > > > > > > Therefore, create a trio of functions to schedule automatic reaping of > > > freshly allocated unwritten space. xfs_alloc_schedule_autoreap creates > > > a paused EFI representing the space we just allocated. Once the > > > allocations are made and the autoreaps scheduled, we can start writing > > > to disk. > > > > > > If the writes succeed, xfs_alloc_cancel_autoreap marks the EFI work > > > items as stale and unpauses the pending deferred work item. Assuming > > > that's done in the same transaction that commits the new structure into > > > the filesystem, we guarantee that either the new object is fully > > > visible, or that all the space gets reclaimed. > > > > > > If the writes succeed but only part of an extent was used, repair must > > > call the same _cancel_autoreap function to kill the first EFI and then > > > log a new EFI to free the unused space. The first EFI is already > > > committed, so it cannot be changed. > > > > > > For full extents that aren't used, xfs_alloc_commit_autoreap will > > > unpause the EFI, which results in the space being freed during the next > > > _defer_finish cycle. > > > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > --- > > > fs/xfs/libxfs/xfs_alloc.c | 104 +++++++++++++++++++++++++++++++++++++++++++-- > > > fs/xfs/libxfs/xfs_alloc.h | 12 +++++ > > > fs/xfs/xfs_extfree_item.c | 11 +++-- > > > 3 files changed, 120 insertions(+), 7 deletions(-) > > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > > index 295d11a27f632..c1ee1862cc1af 100644 > > > --- a/fs/xfs/libxfs/xfs_alloc.c > > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > > @@ -2501,14 +2501,15 @@ xfs_defer_agfl_block( > > > * Add the extent to the list of extents to be free at transaction end. > > > * The list is maintained sorted (by block number). > > > */ > > > -int > > > -xfs_free_extent_later( > > > +static int > > > +__xfs_free_extent_later( > > > struct xfs_trans *tp, > > > xfs_fsblock_t bno, > > > xfs_filblks_t len, > > > const struct xfs_owner_info *oinfo, > > > enum xfs_ag_resv_type type, > > > - bool skip_discard) > > > + bool skip_discard, > > > + struct xfs_defer_pending **dfpp) > > > > I was happy that __xfs_free_extent_later() went away in the last > > patch. > > > > I am sad that it has immediately come back.... > > > > Can we please name this after what it does? Say > > xfs_defer_extent_free()? > > Done. Thank you for a better name! :) > > > > { > > > struct xfs_extent_free_item *xefi; > > > struct xfs_mount *mp = tp->t_mountp; > > > @@ -2556,10 +2557,105 @@ xfs_free_extent_later( > > > XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); > > > > > > xfs_extent_free_get_group(mp, xefi); > > > - xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list); > > > + *dfpp = xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list); > > > return 0; > > > } > > > > > > +int > > > +xfs_free_extent_later( > > > + struct xfs_trans *tp, > > > + xfs_fsblock_t bno, > > > + xfs_filblks_t len, > > > + const struct xfs_owner_info *oinfo, > > > + enum xfs_ag_resv_type type, > > > + bool skip_discard) > > > +{ > > > + struct xfs_defer_pending *dontcare = NULL; > > > + > > > + return __xfs_free_extent_later(tp, bno, len, oinfo, type, skip_discard, > > > + &dontcare); > > > +} > > > + > > > +/* > > > + * Set up automatic freeing of unwritten space in the filesystem. > > > + * > > > + * This function attached a paused deferred extent free item to the > > > + * transaction. Pausing means that the EFI will be logged in the next > > > + * transaction commit, but the pending EFI will not be finished until the > > > + * pending item is unpaused. > > > + * > > > + * If the system goes down after the EFI has been persisted to the log but > > > + * before the pending item is unpaused, log recovery will find the EFI, fail to > > > + * find the EFD, and free the space. > > > + * > > > + * If the pending item is unpaused, the next transaction commit will log an EFD > > > + * without freeing the space. > > > + * > > > + * Caller must ensure that the tp, fsbno, len, oinfo, and resv flags of the > > > + * @args structure are set to the relevant values. > > > + */ > > > +int > > > +xfs_alloc_schedule_autoreap( > > > + const struct xfs_alloc_arg *args, > > > + bool skip_discard, > > > + struct xfs_alloc_autoreap *aarp) > > > +{ > > > + int error; > > > + > > > + error = __xfs_free_extent_later(args->tp, args->fsbno, args->len, > > > + &args->oinfo, args->resv, skip_discard, &aarp->dfp); > > > + if (error) > > > + return error; > > > + > > > + xfs_defer_item_pause(args->tp, aarp->dfp); > > > + return 0; > > > +} > > > > Then this becomes much more readable - xfs_defer_free_extent() > > returns the defer_pending work item for the EFI, which we then > > immediately pause.... > > <nod> > > > > + > > > +/* > > > + * Cancel automatic freeing of unwritten space in the filesystem. > > > + * > > > + * Earlier, we created a paused deferred extent free item and attached it to > > > + * this transaction so that we could automatically roll back a new space > > > + * allocation if the system went down. Now we want to cancel the paused work > > > + * item by marking the EFI stale so we don't actually free the space, unpausing > > > + * the pending item and logging an EFD. > > > + * > > > + * The caller generally should have already mapped the space into the ondisk > > > + * filesystem. If the reserved space was partially used, the caller must call > > > + * xfs_free_extent_later to create a new EFI to free the unused space. > > > + */ > > > +void > > > +xfs_alloc_cancel_autoreap( > > > + struct xfs_trans *tp, > > > + struct xfs_alloc_autoreap *aarp) > > > +{ > > > + struct xfs_defer_pending *dfp = aarp->dfp; > > > + struct xfs_extent_free_item *xefi; > > > + > > > + if (!dfp) > > > + return; > > > + > > > + list_for_each_entry(xefi, &dfp->dfp_work, xefi_list) > > > + xefi->xefi_flags |= XFS_EFI_STALE; > > > + > > > + xfs_defer_item_unpause(tp, dfp); > > > +} > > > > Hmmmm. I see what you are trying to do here, though I'm not sure > > that "stale" is the right word to describe it. The EFI has been > > cancelled, so we want and EFD to be logged without freeing the > > extent. > > > > To me, "stale" means "contents longer valid, do not touch" as per > > XFS_ISTALE, xfs_buf_stale(), etc > > > > Whereas we use "cancelled" to indicate that the pending operations > > on the buffer should not be actioned, such as XFS_BLF_CANCEL buffer > > items in log recovery to indicate the buffer has been freed and > > while it is cancelled we should not replay the changes found in the > > log for that buffer.... > > > > Hence I think this is better named XFS_EFI_CANCELLED, and it also > > matches what the calling function is doing - cancelling the autoreap > > of the extent.... > > Funny story -- CANCELLED was the original name for this flag. I'll put > it back. > > > > diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c > > > index 9e7b58f3566c0..98c2667d369e8 100644 > > > --- a/fs/xfs/xfs_extfree_item.c > > > +++ b/fs/xfs/xfs_extfree_item.c > > > @@ -392,9 +392,14 @@ xfs_trans_free_extent( > > > trace_xfs_bmap_free_deferred(tp->t_mountp, xefi->xefi_pag->pag_agno, 0, > > > agbno, xefi->xefi_blockcount); > > > > > > - error = __xfs_free_extent(tp, xefi->xefi_pag, agbno, > > > - xefi->xefi_blockcount, &oinfo, xefi->xefi_agresv, > > > - xefi->xefi_flags & XFS_EFI_SKIP_DISCARD); > > > + if (xefi->xefi_flags & XFS_EFI_STALE) { > > > + error = 0; > > > + } else { > > > + error = __xfs_free_extent(tp, xefi->xefi_pag, agbno, > > > + xefi->xefi_blockcount, &oinfo, > > > + xefi->xefi_agresv, > > > + xefi->xefi_flags & XFS_EFI_SKIP_DISCARD); > > > + } > > > > Just init error to 0 when it is declared, then this becomes: > > > > if (!(xefi->xefi_flags & XFS_EFI_CANCELLED)) { > > error = __xfs_free_extent(tp, xefi->xefi_pag, agbno, > > xefi->xefi_blockcount, &oinfo, > > xefi->xefi_agresv, > > xefi->xefi_flags & XFS_EFI_SKIP_DISCARD); > > } > > <nod> Eventually the rt reflink patch will trip over this (since rtdev > vs. datadev selection is also an xefi_flag) but that'll come later. ...only now it won't, since hch and I have been hacking on getting rt modernization reviewed; and that involved splitting out the rt paths so that the the existing functions aren't littered with conditionals. --D > > Otherwise the code looks ok. > > Yay! > > --D > > > -- > > Dave Chinner > > david@xxxxxxxxxxxxx