On Mon, Oct 23, 2023 at 05:44:14PM -0700, Leah Rumancik wrote: > On Mon, Oct 23, 2023 at 02:22:21PM -0700, Darrick J. Wong wrote: > > On Mon, Oct 23, 2023 at 11:14:10AM -0700, Leah Rumancik wrote: > > > We flush the data device cache before we issue external log IO. Since > > > 7d839e325af2, we check the return value of the flush, and if the flush > > > failed, we shut down the log immediately and return. However, the > > > iclog->ic_sema is left in a decremented state so let's add an up(). > > > Prior to this patch, xfs/438 would fail consistently when running with > > > an external log device: > > > > > > sync > > > -> xfs_log_force > > > -> xlog_write_iclog > > > -> down(&iclog->ic_sema) > > > -> blkdev_issue_flush (fail causes us to intiate shutdown) > > > -> xlog_force_shutdown > > > -> return > > > > > > unmount > > > -> xfs_log_umount > > > -> xlog_wait_iclog_completion > > > -> down(&iclog->ic_sema) --------> HANG > > > > > > There is a second early return / shutdown. Add an up() there as well. > > > > Ow. Yes, I think it's correct that both of those error returns need to > > drop ic_sema since we don't submit_bio, so there is no xlog_ioend_work > > to do it for us. > > > > > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") > > > > Hmm. This bug was introduced in b5d721eaae47e ("xfs: external logs need > > to flush data device"), not 7d839. That said, this patch only applies > > cleanly to 7d839e325af2. > > > > b5d721 was introduced in 5.14 and 7d839 came in via 6.0, so ... this is > > where I would have spat out: > > > > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") > > Actually-Fixes: b5d721eaae47e ("xfs: external logs need to flush data device") > > I'm not sure I follow. Before 7d839e325af2, we didn't return if there > was an issue with the flush so there wasn't an issue with ic_sema. > 7d839e325af2 was a fix for eef983ffeae7 though. Do you try to keep > fixes tags associated with the original commit as opposed to fixes of > fixes? I think we're both wrong about which commits this patch applies to. The error return for blkdev_issue_flush failing to unlock ic_sema was introduced in b5d721 when we added the bailout there. The error return for xlog_map_iclog_data failing to unlock ic_sema was introduced back in 842a42d126b4 ("xfs: shutdown on failure to add page to log bio") back in 5.7. So that's the original bug. That said, I think the stable rules say that the Fixes tag is supposed to help AUTOSEL, so they want the newest commit. Since you all are doing the backports by hand I suppose the AUTOSEL rules might not matter as much. So, uh... let's tag all three? :) > > > > > Signed-off-by: Leah Rumancik <leah.rumancik@xxxxxxxxx> > > > --- > > > > > > Notes: > > > Tested auto group for xfs/4k and xfs/logdev configs with no regressions > > > seen. > > > > > > fs/xfs/xfs_log.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > > index 51c100c86177..b4a8105299c2 100644 > > > --- a/fs/xfs/xfs_log.c > > > +++ b/fs/xfs/xfs_log.c > > > @@ -1926,6 +1926,7 @@ xlog_write_iclog( > > > */ > > > if (log->l_targ != log->l_mp->m_ddev_targp && > > > blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) { > > > + up(&iclog->ic_sema); > > > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > > > return; > > > } > > > @@ -1936,6 +1937,7 @@ xlog_write_iclog( > > > iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA); > > > > > > if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) { > > > + up(&iclog->ic_sema); > > > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > > > > I wonder if these two should both become a cleanup clause at the end? > > Sure, that sounds good, I'll create a new version. > > Thanks for the review! :) NP. --D > - Leah > > > > if (log->l_targ != log->l_mp->m_ddev_targp && > > blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) > > goto shutdown; > > > > ... > > > > if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) > > goto shutdown; > > > > ... > > > > submit_bio(&iclog->ic_bio); > > return; > > > > shutdown: > > up(&iclog->ic_sema); > > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > > } > > > > Seeing as we've screwed this up twice now and not a whole lot of people > > actually use external logs, and somehow I've never seen this on my test > > fleet. > > > > Anyway the code change looks correct so modulo the stylistic thing, > > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > --D > > > > > return; > > > } > > > -- > > > 2.42.0.758.gaed0368e0e-goog > > >