Re: [PATCH] xfs: ensure submit buffers on LSN boundaries in error handlers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 28, 2023 at 08:46:46PM +0800, Long Li wrote:
> In order to make sure that submits buffers on lsn boundaries in the
> abnormal paths, we need to check error status before submit buffers that
> have been added from the last record processed. If error status exist,
> buffers in the bufffer_list should be canceled.
> 
> Canceling the buffers in the buffer_list directly isn't correct, unlike
> any other place where write list was canceled, these buffers has been
> initialized by xfs_buf_item_init() during recovery and held by buf
> item, buf items will not be released in xfs_buf_delwri_cancel(). If
> these buffers are submitted successfully, buf items assocated with
> the buffer will be released in io end process. So releasing buf item
> in write list cacneling process is needed.

I still don't think this is correct.

> Fixes: 50d5c8d8e938 ("xfs: check LSN ordering for v5 superblocks during recovery")
> Signed-off-by: Long Li <leo.lilong@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_buf.c         |  2 ++
>  fs/xfs/xfs_log_recover.c | 22 +++++++++++++---------
>  2 files changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 8e5bd50d29fe..6a1b26aaf97e 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -2075,6 +2075,8 @@ xfs_buf_delwri_cancel(
>  		xfs_buf_lock(bp);
>  		bp->b_flags &= ~_XBF_DELWRI_Q;
>  		xfs_buf_list_del(bp);
> +		if (bp->b_log_item)
> +			xfs_buf_item_relse(bp);
>  		xfs_buf_relse(bp);

I still don't think this is safe.  The buffer log item might still be
tracked in the AIL when the delwri list is cancelled, so the delwri
list cancelling cannot release the BLI without removing the item
from the AIL, too. The delwri cancelling walk really shouldn't be
screwing with AIL state, which means it can't touch the BLIs here.

At minimum, it's a landmine for future users of
xfs_buf_delwri_cancel().  A quick look at the quotacheck code
indicates that it can cancel delwri lists that have BLIs in the AIL
(for newly allocated dquot chunks), so I think this is a real concern.

This is one of the reasons for submitting the delwri list on error;
the IO completion code does all the correct cleanup of log items
including removing them from the AIL because the buffer is now
either clean or stale and no longer needs to be tracked by the AIL.

If the filesystem has been shut down, then delwri list submission
will error out all buffers on the list via IO submission/completion
and do all the correct cleanup automatically.

I note that write IO errors during log recovery will cause immediate
shutdown of the filesytsem via xfs_buf_ioend_handle_error():

	/*
         * We're not going to bother about retrying this during recovery.
         * One strike!
         */
        if (bp->b_flags & _XBF_LOGRECOVERY) {
                xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
                return false;
        }

So I'm guessing that the IO error injection error that caused this
failure was on a buffer read part way through recovering items.

Can you confirm that the failure is only seen after read IO error
injection and that write IO error injection causes immediate
shutdown and so avoids the problem altogether?

If so, then all we need to do to handle instantiation side errors (EIO, ENOMEM,
etc) is this:

	/*
	 * Submit buffers that have been dirtied by the last record recovered.
	 */
	if (!list_empty(&buffer_list)) {
		if (error) {
			/*
			 * If there has been an item recovery error then we
			 * cannot allow partial checkpoint writeback to
			 * occur.  We might have multiple checkpoints with the
			 * same start LSN in this buffer list, and partial
			 * writeback of a checkpoint in this situation can
			 * prevent future recovery of all the changes in the
			 * checkpoints at this start LSN.
			 *
			 * Note: Shutting down the filesystem will result in the
			 * delwri submission marking all the buffers stale,
			 * completing them and cleaning up _XBF_LOGRECOVERY
			 * state without doing any IO.
			 */
			xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
		}
		error2 = xfs_buf_delwri_submit(&buffer_list);
	}

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux