Re: [PATCH] xfs: fix internal error from AGFL exhaustion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 25, 2023 at 03:35:17PM +1100, Dave Chinner wrote:
> On Tue, Oct 24, 2023 at 04:37:33PM -0700, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@xxxxxx>
> > 
> > We've been seeing XFS errors like the following:
> > 
> > XFS: Internal error i != 1 at line 3526 of file fs/xfs/libxfs/xfs_btree.c.  Caller xfs_btree_insert+0x1ec/0x280
> > ...
> > Call Trace:
> >  xfs_corruption_error+0x94/0xa0
> >  xfs_btree_insert+0x221/0x280
> >  xfs_alloc_fixup_trees+0x104/0x3e0
> >  xfs_alloc_ag_vextent_size+0x667/0x820
> >  xfs_alloc_fix_freelist+0x5d9/0x750
> >  xfs_free_extent_fix_freelist+0x65/0xa0
> >  __xfs_free_extent+0x57/0x180
> > ...
> 
> Good find, Omar!
> 
> For completeness, what's the rest of the trace? We've recently
> changed how extent freeing path works w.r.t. busy extents so I'm
> curious as to what the path into this code is.

This one is from my fpunch reproducer, but there were two call traces we
actually saw in production. One from writeback:

xfs_corruption_error+0x90/0xa0
xfs_btree_insert+0x1b1/0x1d0
xfs_alloc_fixup_trees+0x28e/0x450
xfs_alloc_ag_vextent_size+0x4e0/0x780
xfs_alloc_ag_vextent+0x11c/0x140
xfs_alloc_fix_freelist+0x3f2/0x510
xfs_alloc_vextent+0x254/0x490
xfs_bmap_btalloc+0x301/0x6e0
xfs_bmapi_allocate+0x206/0x2d0
xfs_bmapi_convert_delalloc+0x2b9/0x450
xfs_map_blocks+0x1cc/0x470
iomap_do_writepage+0x22b/0x860
write_cache_pages+0x1ae/0x4c0
iomap_writepages+0x1c/0x40
xfs_vm_writepages+0x6b/0xa0
do_writepages+0x6f/0x1d0
__writeback_single_inode+0x39/0x2b0
writeback_sb_inodes+0x277/0x510
__writeback_inodes_wb+0x6e/0xe0
wb_writeback+0xda/0x250
wb_workfn+0x1f4/0x430
process_one_work+0x12c/0x490
worker_thread+0x143/0x3e0
kthread+0xa3/0xd0
ret_from_fork+0x1f/0x30

And one from inodegc:

xfs_corruption_error+0x90/0xa0
xfs_btree_insert+0x1b1/0x1d0
xfs_alloc_fixup_trees+0x28e/0x450
xfs_alloc_ag_vextent_size+0x4e0/0x780
xfs_alloc_ag_vextent+0x11c/0x140
xfs_alloc_fix_freelist+0x3f2/0x510
__xfs_free_extent+0xde/0x1f0
xfs_trans_free_extent+0x54/0xd0
xfs_extent_free_finish_item+0x69/0xa0
xfs_defer_finish_noroll+0x163/0x510
xfs_defer_finish+0x10/0x70
xfs_itruncate_extents_flags+0x13a/0x250
xfs_inactive_truncate+0xac/0xe0
xfs_inactive+0x14f/0x160
xfs_inodegc_worker+0x81/0x100
process_one_work+0x12c/0x490
worker_thread+0x143/0x3e0
kthread+0xa3/0xd0
ret_from_fork+0x1f/0x30

These are both from 5.19.

> > (It's guaranteed that
> > none of the internal nodes need to be split because they were just
> > split.)
> 
> Hmmm. That doesn't sound right - btrees don't work that way.
> 
> We don't know what path the tree split along in the previous insert
> operation  - a split only guarantees the next insert along that same
> path won't split if the insert hits one of the two leaf blocks from
> the split.  Insert into a different leaf can still split interior
> nodes all the way back up the tree the point where it intersects
> with the previous full height split path.
> 
> e.g. if the previous full height split was down the left side of the
> tree and the next insert is in the right half of the tree, the right
> half can split on insert all the way up to the level below the old
> root.

You're right, I was only thinking about the 2->3 level case. My
assumption is only true in that case.

> > Fix it by adding an extra block of slack in the freelist for the
> > potential leaf split in each of the bnobt and cntbt.
> 
> Hmmm. yeah - example given is 2->3 level split, hence only need to
> handle a single level leaf split before path intersection occurs.
> Yup, adding an extra block will make the exact problem being seen go
> away.
> 
> Ok, what's the general solution? 4-level, 5-level or larger trees?
> 
> Worst split after a full split is up to the level below the old root
> block. the root block was split, so it won't need a split again, so
> only it's children could split again. OK, so that's (height - 1)
> needed for a max split to refill the AGFL after a full height split
> occurred.
> 
> Hence it looks like the minimum AGFL space for any given
> allocation/free operation needs to be:
> 
> 	(full height split reservation) + (AGFL refill split height)
> 
> which is:
> 
> 	= (new height) + (new height - 2)
> 	= 2 * new height - 2
> 	= 2 * (current height + 1) - 2
> 	= 2 * current height
> 
> Ok, so we essentially have to double the AGFL minimum size to handle
> the generic case of AGFL refill splitting free space trees after a
> transaction that has exhausted it's AGFL reservation.
> 
> Now, did I get that right?

That sounds right, but I'll have to think about it more after some sleep
:)

Assuming that is correct, your LE search suggestion sounds kind of nice
rather than such a drastic change to the AGFL size.

> The rmapbt case will need this change, too, because rmapbt blocks
> are allocated from the free list and so an rmapbt update can exhaust
> the free list completely, too.

Ah, I missed where the rmapbt is updated since it was slightly removed
from the xfs_alloc_fixup_trees() code path I was looking at.

Thanks for the quick review!

Omar



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux