On Thu, Dec 28, 2023 at 08:46:46PM +0800, Long Li wrote: > In order to make sure that submits buffers on lsn boundaries in the > abnormal paths, we need to check error status before submit buffers that > have been added from the last record processed. If error status exist, > buffers in the bufffer_list should be canceled. > > Canceling the buffers in the buffer_list directly isn't correct, unlike > any other place where write list was canceled, these buffers has been > initialized by xfs_buf_item_init() during recovery and held by buf > item, buf items will not be released in xfs_buf_delwri_cancel(). If > these buffers are submitted successfully, buf items assocated with > the buffer will be released in io end process. So releasing buf item > in write list cacneling process is needed. I still don't think this is correct. > Fixes: 50d5c8d8e938 ("xfs: check LSN ordering for v5 superblocks during recovery") > Signed-off-by: Long Li <leo.lilong@xxxxxxxxxx> > --- > fs/xfs/xfs_buf.c | 2 ++ > fs/xfs/xfs_log_recover.c | 22 +++++++++++++--------- > 2 files changed, 15 insertions(+), 9 deletions(-) > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > index 8e5bd50d29fe..6a1b26aaf97e 100644 > --- a/fs/xfs/xfs_buf.c > +++ b/fs/xfs/xfs_buf.c > @@ -2075,6 +2075,8 @@ xfs_buf_delwri_cancel( > xfs_buf_lock(bp); > bp->b_flags &= ~_XBF_DELWRI_Q; > xfs_buf_list_del(bp); > + if (bp->b_log_item) > + xfs_buf_item_relse(bp); > xfs_buf_relse(bp); I still don't think this is safe. The buffer log item might still be tracked in the AIL when the delwri list is cancelled, so the delwri list cancelling cannot release the BLI without removing the item from the AIL, too. The delwri cancelling walk really shouldn't be screwing with AIL state, which means it can't touch the BLIs here. At minimum, it's a landmine for future users of xfs_buf_delwri_cancel(). A quick look at the quotacheck code indicates that it can cancel delwri lists that have BLIs in the AIL (for newly allocated dquot chunks), so I think this is a real concern. This is one of the reasons for submitting the delwri list on error; the IO completion code does all the correct cleanup of log items including removing them from the AIL because the buffer is now either clean or stale and no longer needs to be tracked by the AIL. If the filesystem has been shut down, then delwri list submission will error out all buffers on the list via IO submission/completion and do all the correct cleanup automatically. I note that write IO errors during log recovery will cause immediate shutdown of the filesytsem via xfs_buf_ioend_handle_error(): /* * We're not going to bother about retrying this during recovery. * One strike! */ if (bp->b_flags & _XBF_LOGRECOVERY) { xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR); return false; } So I'm guessing that the IO error injection error that caused this failure was on a buffer read part way through recovering items. Can you confirm that the failure is only seen after read IO error injection and that write IO error injection causes immediate shutdown and so avoids the problem altogether? If so, then all we need to do to handle instantiation side errors (EIO, ENOMEM, etc) is this: /* * Submit buffers that have been dirtied by the last record recovered. */ if (!list_empty(&buffer_list)) { if (error) { /* * If there has been an item recovery error then we * cannot allow partial checkpoint writeback to * occur. We might have multiple checkpoints with the * same start LSN in this buffer list, and partial * writeback of a checkpoint in this situation can * prevent future recovery of all the changes in the * checkpoints at this start LSN. * * Note: Shutting down the filesystem will result in the * delwri submission marking all the buffers stale, * completing them and cleaning up _XBF_LOGRECOVERY * state without doing any IO. */ xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); } error2 = xfs_buf_delwri_submit(&buffer_list); } Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx