On Wed, Jun 28, 2023 at 10:46:25AM -0700, Darrick J. Wong wrote: > On Wed, Jun 28, 2023 at 08:44:06AM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Btrees that aren't freespace management trees use the normal extent > > allocation and freeing routines for their blocks. Hence when a btree > > block is freed, a direct call to xfs_free_extent() is made and the > > extent is immediately freed. This puts the entire free space > > management btrees under this path, so we are stacking btrees on > > btrees in the call stack. The inobt, finobt and refcount btrees > > all do this. > > > > However, the bmap btree does not do this - it calls > > xfs_free_extent_later() to defer the extent free operation via an > > XEFI and hence it gets processed in deferred operation processing > > during the commit of the primary transaction (i.e. via intent > > chaining). > > > > We need to change xfs_free_extent() to behave in a non-blocking > > manner so that we can avoid deadlocks with busy extents near ENOSPC > > in transactions that free multiple extents. Inserting or removing a > > record from a btree can cause a multi-level tree merge operation and > > that will free multiple blocks from the btree in a single > > transaction. i.e. we can call xfs_free_extent() multiple times, and > > hence the btree manipulation transaction is vulnerable to this busy > > extent deadlock vector. > > > > To fix this, convert all the remaining callers of xfs_free_extent() > > to use xfs_free_extent_later() to queue XEFIs and hence defer > > processing of the extent frees to a context that can be safely > > restarted if a deadlock condition is detected. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/libxfs/xfs_ag.c | 2 +- > > fs/xfs/libxfs/xfs_alloc.c | 4 ++++ > > fs/xfs/libxfs/xfs_alloc.h | 8 +++++--- > > fs/xfs/libxfs/xfs_bmap.c | 8 +++++--- > > fs/xfs/libxfs/xfs_bmap_btree.c | 3 ++- > > fs/xfs/libxfs/xfs_ialloc.c | 8 ++++---- > > fs/xfs/libxfs/xfs_ialloc_btree.c | 3 +-- > > fs/xfs/libxfs/xfs_refcount.c | 9 ++++++--- > > fs/xfs/libxfs/xfs_refcount_btree.c | 8 +------- > > fs/xfs/xfs_extfree_item.c | 3 ++- > > fs/xfs/xfs_reflink.c | 3 ++- > > 11 files changed, 33 insertions(+), 26 deletions(-) > > > > diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c > > index ee84835ebc66..e9cc481b4ddf 100644 > > --- a/fs/xfs/libxfs/xfs_ag.c > > +++ b/fs/xfs/libxfs/xfs_ag.c > > @@ -985,7 +985,7 @@ xfs_ag_shrink_space( > > goto resv_err; > > > > err2 = __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL, > > - true); > > + XFS_AG_RESV_NONE, true); > > if (err2) > > goto resv_err; > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > index c20fe99405d8..cc3f7b905ea1 100644 > > --- a/fs/xfs/libxfs/xfs_alloc.c > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > @@ -2449,6 +2449,7 @@ xfs_defer_agfl_block( > > xefi->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); > > xefi->xefi_blockcount = 1; > > xefi->xefi_owner = oinfo->oi_owner; > > + xefi->xefi_type = XFS_AG_RESV_AGFL; > > if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, xefi->xefi_startblock))) > > return -EFSCORRUPTED; > > @@ -2470,6 +2471,7 @@ __xfs_free_extent_later( > > xfs_fsblock_t bno, > > xfs_filblks_t len, > > const struct xfs_owner_info *oinfo, > > + enum xfs_ag_resv_type type, > > bool skip_discard) > > { > > struct xfs_extent_free_item *xefi; > > @@ -2490,6 +2492,7 @@ __xfs_free_extent_later( > > ASSERT(agbno + len <= mp->m_sb.sb_agblocks); > > #endif > > ASSERT(xfs_extfree_item_cache != NULL); > > + ASSERT(type != XFS_AG_RESV_AGFL); > > > > if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len))) > > return -EFSCORRUPTED; > > @@ -2498,6 +2501,7 @@ __xfs_free_extent_later( > > GFP_KERNEL | __GFP_NOFAIL); > > xefi->xefi_startblock = bno; > > xefi->xefi_blockcount = (xfs_extlen_t)len; > > + xefi->xefi_type = type; > > if (skip_discard) > > xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD; > > if (oinfo) { > > diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h > > index 85ac470be0da..121faf1e11ad 100644 > > --- a/fs/xfs/libxfs/xfs_alloc.h > > +++ b/fs/xfs/libxfs/xfs_alloc.h > > @@ -232,7 +232,7 @@ xfs_buf_to_agfl_bno( > > > > int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, > > xfs_filblks_t len, const struct xfs_owner_info *oinfo, > > - bool skip_discard); > > + enum xfs_ag_resv_type type, bool skip_discard); > > > > /* > > * List of extents to be free "later". > > @@ -245,6 +245,7 @@ struct xfs_extent_free_item { > > xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ > > struct xfs_perag *xefi_pag; > > unsigned int xefi_flags; > > /me is barely back from vacation, starting to process the ~1100 emails > by taking care of the obvious bugfixes first... > > > + enum xfs_ag_resv_type xefi_type; > > I got confused by 'xefi_type' until I remembered that > XFS_DEFER_OPS_TYPE_AGFL_FREE / XFS_DEFER_OPS_TYPE_FREE are stuffed in > the xfs_defer_pending structure, not the xefi itself. > > Could this field be named xefi_agresv instead? Sure. > The rest of the logic in this patch looks correct and makes things > easier for the rt modernization patches, so I'll say > > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > and change the name on commit, if that's ok? That's fine. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx