Re: [PATCH] xfs: fix deadlock when set label online

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 29 Jun 2023 09:10:55 +1000

On Tue, Jun 27, 2023 at 04:42:41PM +0800, yangerkun wrote:
> 
> 
> 在 2023/6/27 5:45, Dave Chinner 写道:
> > On Mon, Jun 26, 2023 at 09:15:42PM +0800, yangerkun wrote:
> > > From: yangerkun <yangerkun@xxxxxxxxxx>
> > > 
> > > Combine use of xfs_trans_hold and xfs_trans_set_sync in xfs_sync_sb_buf
> > > can trigger a deadlock once shutdown happened concurrently. xlog_ioend_work
> > > will first unpin the sb(which stuck with xfs_buf_lock), then wakeup
> > > xfs_sync_sb_buf. However, xfs_sync_sb_buf never get the chance to unlock
> > > sb until been wakeup by xlog_ioend_work.
> > > 
> > > xfs_sync_sb_buf
> > >    xfs_trans_getsb // lock sb buf
> > >    xfs_trans_bhold // sb buf keep lock until success commit
> > >    xfs_trans_commit
> > >    ...
> > >      xfs_log_force_seq
> > >        xlog_force_lsn
> > >          xlog_wait_on_iclog
> > >            xlog_wait(&iclog->ic_force_wait... // shutdown happened
> > >    xfs_buf_relse // unlock sb buf
> > > 
> > > xlog_ioend_work
> > >    xlog_force_shutdown
> > >      xlog_state_shutdown_callbacks
> > >        xlog_cil_process_committed
> > >          xlog_cil_committed
> > >          ...
> > >          xfs_buf_item_unpin
> > >            xfs_buf_lock // deadlock
> > >        wake_up_all(&iclog->ic_force_wait)
> > > 
> > > xfs_ioc_setlabel use xfs_sync_sb_buf to make sure userspace will see the
> > > change for sb immediately. We can simply call xfs_ail_push_all_sync to
> > > do this and sametime fix the deadlock.
> > 
> > Why is this deadlock specific to the superblock buffer?
> 
> Hi Dave,
> 
> Thanks a lot for your revirew! We find this problem when do some code
> reading(which can help us to fix another growfs bug). And then reproduce it
> easily when we set label online frequently with IO error inject at the
> sametime.

Right, I know how it can be triggered; that's not actually my
concern...

> > Can't any buffer that is held locked over a synchronous transaction
> > commit deadlock during a shutdown like this?
> 
> After check all place use xfs_buf_bhold, it seems xfs_sync_sb_buf is the
> only convict that combine use xfs_trans_hold and xfs_trans_set_sync(I'm not
> familiar with xfs yet, so I may have some problems with my code check)...

Yes, I can also see that. But my concern is that this change only
addresses the symptom, but leaves the underlying deadlock unsolved.

Indeed, this isn't xfs_trans_commit() I'm worried about here; it's
the call to xfs_log_force(mp, XFS_LOG_SYNC) or
xfs_log_force_seq(XFS_LOG_SYNC) with a buffer held locked that I'm
worried about.

i.e. We have a buffer in the CIL (from a previous transaction) that
we currently hold locked while we call xfs_log_force(XFS_LOG_SYNC).
If a shutdown occurs while we are waiting for journal IO completion
to occur, then xlog_ioend_work() will attempt to lock the buffer and
deadlock, right?

e.g. I'm thinking of things like busy extent flushing (hold AGF +
AGFL + AG btree blocks locked when we call xfs_log_force()) could
also be vulnerable to the same deadlock...

If that's true, how do we avoid the shutdown from causing a deadlock
in these situations?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx