On Mon, Oct 23, 2023 at 02:22:21PM -0700, Darrick J. Wong wrote: > On Mon, Oct 23, 2023 at 11:14:10AM -0700, Leah Rumancik wrote: > > We flush the data device cache before we issue external log IO. Since > > 7d839e325af2, we check the return value of the flush, and if the flush > > failed, we shut down the log immediately and return. However, the > > iclog->ic_sema is left in a decremented state so let's add an up(). > > Prior to this patch, xfs/438 would fail consistently when running with > > an external log device: > > > > sync > > -> xfs_log_force > > -> xlog_write_iclog > > -> down(&iclog->ic_sema) > > -> blkdev_issue_flush (fail causes us to intiate shutdown) > > -> xlog_force_shutdown > > -> return > > > > unmount > > -> xfs_log_umount > > -> xlog_wait_iclog_completion > > -> down(&iclog->ic_sema) --------> HANG > > > > There is a second early return / shutdown. Add an up() there as well. > > Ow. Yes, I think it's correct that both of those error returns need to > drop ic_sema since we don't submit_bio, so there is no xlog_ioend_work > to do it for us. > > > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") > > Hmm. This bug was introduced in b5d721eaae47e ("xfs: external logs need > to flush data device"), not 7d839. That said, this patch only applies > cleanly to 7d839e325af2. > > b5d721 was introduced in 5.14 and 7d839 came in via 6.0, so ... this is > where I would have spat out: > > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") > Actually-Fixes: b5d721eaae47e ("xfs: external logs need to flush data device") I'm not sure I follow. Before 7d839e325af2, we didn't return if there was an issue with the flush so there wasn't an issue with ic_sema. 7d839e325af2 was a fix for eef983ffeae7 though. Do you try to keep fixes tags associated with the original commit as opposed to fixes of fixes? > > > Signed-off-by: Leah Rumancik <leah.rumancik@xxxxxxxxx> > > --- > > > > Notes: > > Tested auto group for xfs/4k and xfs/logdev configs with no regressions > > seen. > > > > fs/xfs/xfs_log.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > index 51c100c86177..b4a8105299c2 100644 > > --- a/fs/xfs/xfs_log.c > > +++ b/fs/xfs/xfs_log.c > > @@ -1926,6 +1926,7 @@ xlog_write_iclog( > > */ > > if (log->l_targ != log->l_mp->m_ddev_targp && > > blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) { > > + up(&iclog->ic_sema); > > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > > return; > > } > > @@ -1936,6 +1937,7 @@ xlog_write_iclog( > > iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA); > > > > if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) { > > + up(&iclog->ic_sema); > > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > > I wonder if these two should both become a cleanup clause at the end? Sure, that sounds good, I'll create a new version. Thanks for the review! :) - Leah > > if (log->l_targ != log->l_mp->m_ddev_targp && > blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) > goto shutdown; > > ... > > if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) > goto shutdown; > > ... > > submit_bio(&iclog->ic_bio); > return; > > shutdown: > up(&iclog->ic_sema); > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > } > > Seeing as we've screwed this up twice now and not a whole lot of people > actually use external logs, and somehow I've never seen this on my test > fleet. > > Anyway the code change looks correct so modulo the stylistic thing, > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > --D > > > return; > > } > > -- > > 2.42.0.758.gaed0368e0e-goog > >