On Tue, Oct 31, 2023 at 02:24:00PM -0700, Darrick J. Wong wrote: > On Mon, Oct 30, 2023 at 02:00:02PM -0700, Omar Sandoval wrote: > > From: Omar Sandoval <osandov@xxxxxx> > > > > We've been seeing XFS errors like the following: > > > > XFS: Internal error i != 1 at line 3526 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_btree_insert+0x1ec/0x280 > > ... > > Call Trace: > > xfs_corruption_error+0x94/0xa0 > > xfs_btree_insert+0x221/0x280 > > xfs_alloc_fixup_trees+0x104/0x3e0 > > xfs_alloc_ag_vextent_size+0x667/0x820 > > xfs_alloc_fix_freelist+0x5d9/0x750 > > xfs_free_extent_fix_freelist+0x65/0xa0 > > __xfs_free_extent+0x57/0x180 > > ... > > > > This is the XFS_IS_CORRUPT() check in xfs_btree_insert() when > > xfs_btree_insrec() fails. > > > > After converting this into a panic and dissecting the core dump, I found > > that xfs_btree_insrec() is failing because it's trying to split a leaf > > node in the cntbt when the AG free list is empty. In particular, it's > > failing to get a block from the AGFL _while trying to refill the AGFL_. > > > > If a single operation splits every level of the bnobt and the cntbt (and > > the rmapbt if it is enabled) at once, the free list will be empty. Then, > > when the next operation tries to refill the free list, it allocates > > space. If the allocation does not use a full extent, it will need to > > insert records for the remaining space in the bnobt and cntbt. And if > > those new records go in full leaves, the leaves (and potentially more > > nodes up to the old root) need to be split. > > > > Fix it by accounting for the additional splits that may be required to > > refill the free list in the calculation for the minimum free list size. > > > > P.S. As far as I can tell, this bug has existed for a long time -- maybe > > back to xfs-history commit afdf80ae7405 ("Add XFS_AG_MAXLEVELS macros > > ...") in April 1994! It requires a very unlucky sequence of events, and > > in fact we didn't hit it until a particular sparse mmap workload updated > > from 5.12 to 5.19. But this bug existed in 5.12, so it must've been > > exposed by some other change in allocation or writeback patterns. It's > > also much less likely to be hit with the rmapbt enabled, since that > > increases the minimum free list size and is unlikely to split at the > > same time as the bnobt and cntbt. > > > > Signed-off-by: Omar Sandoval <osandov@xxxxxx> > > --- > > Changes since v1 [1]: > > > > - Updated calculation to account for internal nodes that may also need > > to be split. > > - Updated comments and commit message accordingly. > > > > 1: https://lore.kernel.org/linux-xfs/ZTrSiwF7OAq0hJHn@xxxxxxxxxxxxxxxxxxx/T/ > > > > fs/xfs/libxfs/xfs_alloc.c | 25 ++++++++++++++++++++++--- > > 1 file changed, 22 insertions(+), 3 deletions(-) > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > index 3069194527dd..3949c6ad0cce 100644 > > --- a/fs/xfs/libxfs/xfs_alloc.c > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > @@ -2275,16 +2275,35 @@ xfs_alloc_min_freelist( > > > > ASSERT(mp->m_alloc_maxlevels > 0); > > > > + /* > > + * For a btree shorter than the maximum height, the worst case is that > > + * every level gets split and a new level is added, then while inserting > > + * another entry to refill the AGFL, every level under the old root gets > > + * split again. This is: > > + * > > + * (current height + 1) + (current height - 1) > > Could you make this comment define this relationship more explicitly? > > * (full height split reservation) + (AGFL refill split height) > * (current height + 1) + (current height - 1) > > With that added, > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> Yup, with that comment change I'm happy with it, too. Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> -- Dave Chinner david@xxxxxxxxxxxxx