Re: [PATCH] xfs: up(ic_sema) if flushing data device fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 23, 2023 at 05:44:14PM -0700, Leah Rumancik wrote:
> On Mon, Oct 23, 2023 at 02:22:21PM -0700, Darrick J. Wong wrote:
> > On Mon, Oct 23, 2023 at 11:14:10AM -0700, Leah Rumancik wrote:
> > > We flush the data device cache before we issue external log IO. Since
> > > 7d839e325af2, we check the return value of the flush, and if the flush
> > > failed, we shut down the log immediately and return. However, the
> > > iclog->ic_sema is left in a decremented state so let's add an up().
> > > Prior to this patch, xfs/438 would fail consistently when running with
> > > an external log device:
> > > 
> > > sync
> > >   -> xfs_log_force
> > >   -> xlog_write_iclog
> > >       -> down(&iclog->ic_sema)
> > >       -> blkdev_issue_flush (fail causes us to intiate shutdown)
> > >           -> xlog_force_shutdown
> > >           -> return
> > > 
> > > unmount
> > >   -> xfs_log_umount
> > >       -> xlog_wait_iclog_completion
> > >           -> down(&iclog->ic_sema) --------> HANG
> > > 
> > > There is a second early return / shutdown. Add an up() there as well.
> > 
> > Ow.  Yes, I think it's correct that both of those error returns need to
> > drop ic_sema since we don't submit_bio, so there is no xlog_ioend_work
> > to do it for us.
> > 
> > > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices")
> > 
> > Hmm.  This bug was introduced in b5d721eaae47e ("xfs: external logs need
> > to flush data device"), not 7d839.  That said, this patch only applies
> > cleanly to 7d839e325af2.
> > 
> > b5d721 was introduced in 5.14 and 7d839 came in via 6.0, so ... this is
> > where I would have spat out:
> > 
> > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices")
> > Actually-Fixes: b5d721eaae47e ("xfs: external logs need to flush data device")
> 
> I'm not sure I follow. Before 7d839e325af2, we didn't return if there
> was an issue with the flush so there wasn't an issue with ic_sema.
> 7d839e325af2 was a fix for eef983ffeae7 though. Do you try to keep
> fixes tags associated with the original commit as opposed to fixes of
> fixes?

I think we're both wrong about which commits this patch applies to.

The error return for blkdev_issue_flush failing to unlock ic_sema was
introduced in b5d721 when we added the bailout there.

The error return for xlog_map_iclog_data failing to unlock ic_sema was
introduced back in 842a42d126b4 ("xfs: shutdown on failure to add page
to log bio") back in 5.7.  So that's the original bug.

That said, I think the stable rules say that the Fixes tag is supposed
to help AUTOSEL, so they want the newest commit.  Since you all are
doing the backports by hand I suppose the AUTOSEL rules might not matter
as much.

So, uh... let's tag all three? :)

> > 
> > > Signed-off-by: Leah Rumancik <leah.rumancik@xxxxxxxxx>
> > > ---
> > > 
> > > Notes:
> > >     Tested auto group for xfs/4k and xfs/logdev configs with no regressions
> > >     seen.
> > > 
> > >  fs/xfs/xfs_log.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index 51c100c86177..b4a8105299c2 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -1926,6 +1926,7 @@ xlog_write_iclog(
> > >  		 */
> > >  		if (log->l_targ != log->l_mp->m_ddev_targp &&
> > >  		    blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) {
> > > +			up(&iclog->ic_sema);
> > >  			xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
> > >  			return;
> > >  		}
> > > @@ -1936,6 +1937,7 @@ xlog_write_iclog(
> > >  	iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA);
> > >  
> > >  	if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) {
> > > +		up(&iclog->ic_sema);
> > >  		xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
> > 
> > I wonder if these two should both become a cleanup clause at the end?
> 
> Sure, that sounds good, I'll create a new version.
> 
> Thanks for the review! :)

NP.

--D

> - Leah
> > 
> > 		if (log->l_targ != log->l_mp->m_ddev_targp &&
> > 		    blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev))
> > 			goto shutdown;
> > 
> > ...
> > 
> > 	if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count))
> > 		goto shutdown;
> > 
> > ...
> > 
> > 	submit_bio(&iclog->ic_bio);
> > 	return;
> > 
> > shutdown:
> > 	up(&iclog->ic_sema);
> > 	xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
> > }
> > 
> > Seeing as we've screwed this up twice now and not a whole lot of people
> > actually use external logs, and somehow I've never seen this on my test
> > fleet.
> > 
> > Anyway the code change looks correct so modulo the stylistic thing,
> > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > --D
> > 
> > >  		return;
> > >  	}
> > > -- 
> > > 2.42.0.758.gaed0368e0e-goog
> > > 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux