On Wed, Jan 03, 2024 at 06:01:21PM -0800, Darrick J. Wong wrote: > On Thu, Dec 28, 2023 at 08:46:46PM +0800, Long Li wrote: > > While performing the IO fault injection test, I caught the following data > > corruption report: > > > > XFS (dm-0): Internal error ltbno + ltlen > bno at line 1957 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_ag_extent+0x79c/0x1130 > > CPU: 3 PID: 33 Comm: kworker/3:0 Not tainted 6.5.0-rc7-next-20230825-00001-g7f8666926889 #214 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 > > Workqueue: xfs-inodegc/dm-0 xfs_inodegc_worker > > Call Trace: > > <TASK> > > dump_stack_lvl+0x50/0x70 > > xfs_corruption_error+0x134/0x150 > > xfs_free_ag_extent+0x7d3/0x1130 > > __xfs_free_extent+0x201/0x3c0 > > xfs_trans_free_extent+0x29b/0xa10 > > xfs_extent_free_finish_item+0x2a/0xb0 > > xfs_defer_finish_noroll+0x8d1/0x1b40 > > xfs_defer_finish+0x21/0x200 > > xfs_itruncate_extents_flags+0x1cb/0x650 > > xfs_free_eofblocks+0x18f/0x250 > > xfs_inactive+0x485/0x570 > > xfs_inodegc_worker+0x207/0x530 > > process_scheduled_works+0x24a/0xe10 > > worker_thread+0x5ac/0xc60 > > kthread+0x2cd/0x3c0 > > ret_from_fork+0x4a/0x80 > > ret_from_fork_asm+0x11/0x20 > > </TASK> > > XFS (dm-0): Corruption detected. Unmount and run xfs_repair > > > > After analyzing the disk image, it was found that the corruption was > > triggered by the fact that extent was recorded in both the inode and AGF > > btrees. After a long time of reproduction and analysis, we found that the > > root cause of the problem was that the AGF btree block was not recovered. > > > > Consider the following situation, Transaction A and Transaction B are in > > the same record, so Transaction A and Transaction B share the same LSN1. > > If the buf item in Transaction A has been recovered, then the buf item in > > Transaction B cannot be recovered, because log recovery skips items with a > > metadata LSN >= the current LSN of the recovery item. If there is still an > > inode item in transaction B that records the Extent X, the Extent X will > > be recorded in both the inode and the AGF btree block after transaction B > > is recovered. > > > > |------------Record (LSN1)------------------|---Record (LSN2)---| > > |----------Trans A------------|-------------Trans B-------------| > > | Buf Item(Extent X) | Buf Item / Inode item(Extent X) | > > | Extent X is freed | Extent X is allocated | > > > > After commit 12818d24db8a ("xfs: rework log recovery to submit buffers on > > LSN boundaries") was introduced, we submit buffers on lsn boundaries during > > log recovery. The above problem can be avoided under normal paths, but it's > > not guaranteed under abnormal paths. Consider the following process, if an > > error was encountered after recover buf item in transaction A and before > > recover buf item in transaction B, buffers that have been added to > > buffer_list will still be submitted, this violates the submits rule on lsn > > boundaries. So buf item in Transaction B cannot be recovered on the next > > mount due to current lsn of transaction equal to metadata lsn on disk. > > > > xlog_do_recovery_pass > > error = xlog_recover_process > > xlog_recover_process_data > > ... > > xlog_recover_buf_commit_pass2 > > xlog_recover_do_reg_buffer //recover buf item in Trans A > > xfs_buf_delwri_queue(bp, buffer_list) > > ... > > ====> Encountered error and returned > > ... > > xlog_recover_buf_commit_pass2 > > xlog_recover_do_reg_buffer //recover buf item in Trans B > > xfs_buf_delwri_queue(bp, buffer_list) > > if (!list_empty(&buffer_list)) > > xfs_buf_delwri_submit(&buffer_list); //submit regardless of error > > > > In order to make sure that submits buffers on lsn boundaries in the > > abnormal paths, we need to check error status before submit buffers that > > have been added from the last record processed. If error status exist, > > buffers in the bufffer_list should be canceled. > > What was the error, specifically? I would have though that recovery > would abort after "Encountered error and returned". Does the recovery > somehow keep running and then finds the buf item in Trans B? > That was not what I meant. I'm just trying to point out that any error that occurs after recovering buf item in Transaction A and before recovering buf item in Transaction B can trigger the problem. It doesn't matter what the error is, for example the buf read error that occurred during this period. > Or is the problem here that after the error, xfs submits the delwri > buffers? And then the user tried to recover a second time, only this > time the recovery attempt reads Trans B, but then doesn't actually write > anything because the ondisk buffer now has the same LSN as Trans B? > Yes, that's what I want to said. I think I should change the description of the process that triggered the issue in commit message to avoid misunderstandings. Thanks Long Li > <confused> > > --D > > > Canceling the buffers in the buffer_list directly isn't correct, unlike > > any other place where write list was canceled, these buffers has been > > initialized by xfs_buf_item_init() during recovery and held by buf > > item, buf items will not be released in xfs_buf_delwri_cancel(). If > > these buffers are submitted successfully, buf items assocated with > > the buffer will be released in io end process. So releasing buf item > > in write list cacneling process is needed. > > > > Fixes: 50d5c8d8e938 ("xfs: check LSN ordering for v5 superblocks during recovery") > > Signed-off-by: Long Li <leo.lilong@xxxxxxxxxx> > > --- > > fs/xfs/xfs_buf.c | 2 ++ > > fs/xfs/xfs_log_recover.c | 22 +++++++++++++--------- > > 2 files changed, 15 insertions(+), 9 deletions(-) > > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > > index 8e5bd50d29fe..6a1b26aaf97e 100644 > > --- a/fs/xfs/xfs_buf.c > > +++ b/fs/xfs/xfs_buf.c > > @@ -2075,6 +2075,8 @@ xfs_buf_delwri_cancel( > > xfs_buf_lock(bp); > > bp->b_flags &= ~_XBF_DELWRI_Q; > > xfs_buf_list_del(bp); > > + if (bp->b_log_item) > > + xfs_buf_item_relse(bp); > > xfs_buf_relse(bp); > > } > > } > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > > index 1251c81e55f9..2cda6c90890d 100644 > > --- a/fs/xfs/xfs_log_recover.c > > +++ b/fs/xfs/xfs_log_recover.c > > @@ -2964,7 +2964,6 @@ xlog_do_recovery_pass( > > char *offset; > > char *hbp, *dbp; > > int error = 0, h_size, h_len; > > - int error2 = 0; > > int bblks, split_bblks; > > int hblks, split_hblks, wrapped_hblks; > > int i; > > @@ -3203,16 +3202,21 @@ xlog_do_recovery_pass( > > bread_err1: > > kmem_free(hbp); > > > > - /* > > - * Submit buffers that have been added from the last record processed, > > - * regardless of error status. > > - */ > > - if (!list_empty(&buffer_list)) > > - error2 = xfs_buf_delwri_submit(&buffer_list); > > - > > if (error && first_bad) > > *first_bad = rhead_blk; > > > > + /* > > + * If there are no error, submit buffers that have been added from the > > + * last record processed, othrewise cancel the write list, to ensure > > + * submit buffers on LSN boundaries. > > + */ > > + if (!list_empty(&buffer_list)) { > > + if (error) > > + xfs_buf_delwri_cancel(&buffer_list); > > + else > > + error = xfs_buf_delwri_submit(&buffer_list); > > + } > > + > > /* > > * Transactions are freed at commit time but transactions without commit > > * records on disk are never committed. Free any that may be left in the > > @@ -3226,7 +3230,7 @@ xlog_do_recovery_pass( > > xlog_recover_free_trans(trans); > > } > > > > - return error ? error : error2; > > + return error; > > } > > > > /* > > -- > > 2.31.1 > > > > >