Re: [PATCH 7/6] xfs: xfs: shutdown during log recovery needs to mark the log shutdown

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Mon, 28 Mar 2022 17:37:17 -0700

On Mon, Mar 28, 2022 at 09:55:34AM +1100, Dave Chinner wrote:
> 
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> When a checkpoint writeback is run by log recovery, corruption
> propagated from the log can result in writeback verifiers failing
> and calling xfs_force_shutdown() from
> xfs_buf_delwri_submit_buffers().
> 
> This results in the mount being marked as shutdown, but the log does
> not get marked as shut down because:
> 
>         /*
>          * If this happens during log recovery then we aren't using the runtime
>          * log mechanisms yet so there's nothing to shut down.
>          */
>         if (!log || xlog_in_recovery(log))
>                 return false;
> 
> If there are other buffers that then fail (say due to detecting the
> mount shutdown), they will now hang in xfs_do_force_shutdown()
> waiting for the log to shut down like this:
> 
>   __schedule+0x30d/0x9e0
>   schedule+0x55/0xd0
>   xfs_do_force_shutdown+0x1cd/0x200
>   ? init_wait_var_entry+0x50/0x50
>   xfs_buf_ioend+0x47e/0x530
>   __xfs_buf_submit+0xb0/0x240
>   xfs_buf_delwri_submit_buffers+0xfe/0x270
>   xfs_buf_delwri_submit+0x3a/0xc0
>   xlog_do_recovery_pass+0x474/0x7b0
>   ? do_raw_spin_unlock+0x30/0xb0
>   xlog_do_log_recovery+0x91/0x140
>   xlog_do_recover+0x38/0x1e0
>   xlog_recover+0xdd/0x170
>   xfs_log_mount+0x17e/0x2e0
>   xfs_mountfs+0x457/0x930
>   xfs_fs_fill_super+0x476/0x830
> 
> xlog_force_shutdown() always needs to mark the log as shut down,
> regardless of whether recovery is in progress or not, so that
> multiple calls to xfs_force_shutdown() during recovery don't end
> up waiting for the log to be shut down like this.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>

That sounds like a pretty straightforward premise.
Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>

--D

> ---
>  fs/xfs/xfs_log.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 6166348ce1d1..5f3f943c34b9 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -3729,11 +3729,7 @@ xlog_force_shutdown(
>  {
>  	bool		log_error = (shutdown_flags & SHUTDOWN_LOG_IO_ERROR);
>  
> -	/*
> -	 * If this happens during log recovery then we aren't using the runtime
> -	 * log mechanisms yet so there's nothing to shut down.
> -	 */
> -	if (!log || xlog_in_recovery(log))
> +	if (!log)
>  		return false;
>  
>  	/*
> @@ -3742,10 +3738,16 @@ xlog_force_shutdown(
>  	 * before the force will prevent the log force from flushing the iclogs
>  	 * to disk.
>  	 *
> -	 * Re-entry due to a log IO error shutdown during the log force is
> -	 * prevented by the atomicity of higher level shutdown code.
> +	 * When we are in recovery, there are no transactions to flush, and
> +	 * we don't want to touch the log because we don't want to perturb the
> +	 * current head/tail for future recovery attempts. Hence we need to
> +	 * avoid a log force in this case.
> +	 *
> +	 * If we are shutting down due to a log IO error, then we must avoid
> +	 * trying to write the log as that may just result in more IO errors and
> +	 * an endless shutdown/force loop.
>  	 */
> -	if (!log_error)
> +	if (!log_error && !xlog_in_recovery(log))
>  		xfs_log_force(log->l_mp, XFS_LOG_SYNC);
>  
>  	/*