On Mon, Oct 30, 2023 at 02:00:02PM -0700, Omar Sandoval wrote: > From: Omar Sandoval <osandov@xxxxxx> > > We've been seeing XFS errors like the following: > > XFS: Internal error i != 1 at line 3526 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_btree_insert+0x1ec/0x280 > ... > Call Trace: > xfs_corruption_error+0x94/0xa0 > xfs_btree_insert+0x221/0x280 > xfs_alloc_fixup_trees+0x104/0x3e0 > xfs_alloc_ag_vextent_size+0x667/0x820 > xfs_alloc_fix_freelist+0x5d9/0x750 > xfs_free_extent_fix_freelist+0x65/0xa0 > __xfs_free_extent+0x57/0x180 > ... > > This is the XFS_IS_CORRUPT() check in xfs_btree_insert() when > xfs_btree_insrec() fails. > > After converting this into a panic and dissecting the core dump, I found > that xfs_btree_insrec() is failing because it's trying to split a leaf > node in the cntbt when the AG free list is empty. In particular, it's > failing to get a block from the AGFL _while trying to refill the AGFL_. > > If a single operation splits every level of the bnobt and the cntbt (and > the rmapbt if it is enabled) at once, the free list will be empty. Then, > when the next operation tries to refill the free list, it allocates > space. If the allocation does not use a full extent, it will need to > insert records for the remaining space in the bnobt and cntbt. And if > those new records go in full leaves, the leaves (and potentially more > nodes up to the old root) need to be split. > > Fix it by accounting for the additional splits that may be required to > refill the free list in the calculation for the minimum free list size. > > P.S. As far as I can tell, this bug has existed for a long time -- maybe > back to xfs-history commit afdf80ae7405 ("Add XFS_AG_MAXLEVELS macros > ...") in April 1994! It requires a very unlucky sequence of events, and > in fact we didn't hit it until a particular sparse mmap workload updated > from 5.12 to 5.19. But this bug existed in 5.12, so it must've been > exposed by some other change in allocation or writeback patterns. It's > also much less likely to be hit with the rmapbt enabled, since that > increases the minimum free list size and is unlikely to split at the > same time as the bnobt and cntbt. > > Signed-off-by: Omar Sandoval <osandov@xxxxxx> > --- > Changes since v1 [1]: > > - Updated calculation to account for internal nodes that may also need > to be split. > - Updated comments and commit message accordingly. > > 1: https://lore.kernel.org/linux-xfs/ZTrSiwF7OAq0hJHn@xxxxxxxxxxxxxxxxxxx/T/ > > fs/xfs/libxfs/xfs_alloc.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index 3069194527dd..3949c6ad0cce 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -2275,16 +2275,35 @@ xfs_alloc_min_freelist( > > ASSERT(mp->m_alloc_maxlevels > 0); > > + /* > + * For a btree shorter than the maximum height, the worst case is that > + * every level gets split and a new level is added, then while inserting > + * another entry to refill the AGFL, every level under the old root gets > + * split again. This is: > + * > + * (current height + 1) + (current height - 1) Could you make this comment define this relationship more explicitly? * (full height split reservation) + (AGFL refill split height) * (current height + 1) + (current height - 1) With that added, Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > + * = (new height) + (new height - 2) > + * = 2 * new height - 2 > + * > + * For a btree of maximum height, the worst case is that every level > + * under the root gets split, then while inserting another entry to > + * refill the AGFL, every level under the root gets split again. This is > + * also: > + * > + * 2 * (new_height - 1) > + * = 2 * new height - 2 > + */ > + > /* space needed by-bno freespace btree */ > min_free = min_t(unsigned int, levels[XFS_BTNUM_BNOi] + 1, > - mp->m_alloc_maxlevels); > + mp->m_alloc_maxlevels) * 2 - 2; > /* space needed by-size freespace btree */ > min_free += min_t(unsigned int, levels[XFS_BTNUM_CNTi] + 1, > - mp->m_alloc_maxlevels); > + mp->m_alloc_maxlevels) * 2 - 2; > /* space needed reverse mapping used space btree */ > if (xfs_has_rmapbt(mp)) > min_free += min_t(unsigned int, levels[XFS_BTNUM_RMAPi] + 1, > - mp->m_rmap_maxlevels); > + mp->m_rmap_maxlevels) * 2 - 2; > > return min_free; > } > -- > 2.41.0 >