On Mon, Mar 28, 2022 at 09:55:34AM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > When a checkpoint writeback is run by log recovery, corruption > propagated from the log can result in writeback verifiers failing > and calling xfs_force_shutdown() from > xfs_buf_delwri_submit_buffers(). > > This results in the mount being marked as shutdown, but the log does > not get marked as shut down because: > > /* > * If this happens during log recovery then we aren't using the runtime > * log mechanisms yet so there's nothing to shut down. > */ > if (!log || xlog_in_recovery(log)) > return false; > > If there are other buffers that then fail (say due to detecting the > mount shutdown), they will now hang in xfs_do_force_shutdown() > waiting for the log to shut down like this: > > __schedule+0x30d/0x9e0 > schedule+0x55/0xd0 > xfs_do_force_shutdown+0x1cd/0x200 > ? init_wait_var_entry+0x50/0x50 > xfs_buf_ioend+0x47e/0x530 > __xfs_buf_submit+0xb0/0x240 > xfs_buf_delwri_submit_buffers+0xfe/0x270 > xfs_buf_delwri_submit+0x3a/0xc0 > xlog_do_recovery_pass+0x474/0x7b0 > ? do_raw_spin_unlock+0x30/0xb0 > xlog_do_log_recovery+0x91/0x140 > xlog_do_recover+0x38/0x1e0 > xlog_recover+0xdd/0x170 > xfs_log_mount+0x17e/0x2e0 > xfs_mountfs+0x457/0x930 > xfs_fs_fill_super+0x476/0x830 > > xlog_force_shutdown() always needs to mark the log as shut down, > regardless of whether recovery is in progress or not, so that > multiple calls to xfs_force_shutdown() during recovery don't end > up waiting for the log to be shut down like this. > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> That sounds like a pretty straightforward premise. Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > --- > fs/xfs/xfs_log.c | 18 ++++++++++-------- > 1 file changed, 10 insertions(+), 8 deletions(-) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index 6166348ce1d1..5f3f943c34b9 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -3729,11 +3729,7 @@ xlog_force_shutdown( > { > bool log_error = (shutdown_flags & SHUTDOWN_LOG_IO_ERROR); > > - /* > - * If this happens during log recovery then we aren't using the runtime > - * log mechanisms yet so there's nothing to shut down. > - */ > - if (!log || xlog_in_recovery(log)) > + if (!log) > return false; > > /* > @@ -3742,10 +3738,16 @@ xlog_force_shutdown( > * before the force will prevent the log force from flushing the iclogs > * to disk. > * > - * Re-entry due to a log IO error shutdown during the log force is > - * prevented by the atomicity of higher level shutdown code. > + * When we are in recovery, there are no transactions to flush, and > + * we don't want to touch the log because we don't want to perturb the > + * current head/tail for future recovery attempts. Hence we need to > + * avoid a log force in this case. > + * > + * If we are shutting down due to a log IO error, then we must avoid > + * trying to write the log as that may just result in more IO errors and > + * an endless shutdown/force loop. > */ > - if (!log_error) > + if (!log_error && !xlog_in_recovery(log)) > xfs_log_force(log->l_mp, XFS_LOG_SYNC); > > /*