Re: [PATCH] xfs: fix deadlock when set label online

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 30 Jun 2023 08:24:19 +1000

On Thu, Jun 29, 2023 at 07:55:10PM +0800, yangerkun wrote:
> 在 2023/6/29 7:10, Dave Chinner 写道:
> > On Tue, Jun 27, 2023 at 04:42:41PM +0800, yangerkun wrote:
> > > 在 2023/6/27 5:45, Dave Chinner 写道:
> > > > On Mon, Jun 26, 2023 at 09:15:42PM +0800, yangerkun wrote:
> > > > > From: yangerkun <yangerkun@xxxxxxxxxx>
> > > > > 
> > > > > Combine use of xfs_trans_hold and xfs_trans_set_sync in xfs_sync_sb_buf
> > > > > can trigger a deadlock once shutdown happened concurrently. xlog_ioend_work
> > > > > will first unpin the sb(which stuck with xfs_buf_lock), then wakeup
> > > > > xfs_sync_sb_buf. However, xfs_sync_sb_buf never get the chance to unlock
> > > > > sb until been wakeup by xlog_ioend_work.
> > > > > 
> > > > > xfs_sync_sb_buf
> > > > >     xfs_trans_getsb // lock sb buf
> > > > >     xfs_trans_bhold // sb buf keep lock until success commit
> > > > >     xfs_trans_commit
> > > > >     ...
> > > > >       xfs_log_force_seq
> > > > >         xlog_force_lsn
> > > > >           xlog_wait_on_iclog
> > > > >             xlog_wait(&iclog->ic_force_wait... // shutdown happened
> > > > >     xfs_buf_relse // unlock sb buf
> > > > > 
> > > > > xlog_ioend_work
> > > > >     xlog_force_shutdown
> > > > >       xlog_state_shutdown_callbacks
> > > > >         xlog_cil_process_committed
> > > > >           xlog_cil_committed
> > > > >           ...
> > > > >           xfs_buf_item_unpin
> > > > >             xfs_buf_lock // deadlock
> > > > >         wake_up_all(&iclog->ic_force_wait)
> > > > > 
> > > > > xfs_ioc_setlabel use xfs_sync_sb_buf to make sure userspace will see the
> > > > > change for sb immediately. We can simply call xfs_ail_push_all_sync to
> > > > > do this and sametime fix the deadlock.
> > > > 
> > > > Why is this deadlock specific to the superblock buffer?
> > > 
> > > Hi Dave,
> > > 
> > > Thanks a lot for your revirew! We find this problem when do some code
> > > reading(which can help us to fix another growfs bug). And then reproduce it
> > > easily when we set label online frequently with IO error inject at the
> > > sametime.
> > 
> > Right, I know how it can be triggered; that's not actually my
> > concern...
> > 
> > > > Can't any buffer that is held locked over a synchronous transaction
> > > > commit deadlock during a shutdown like this?
> > > 
> > > After check all place use xfs_buf_bhold, it seems xfs_sync_sb_buf is the
> > > only convict that combine use xfs_trans_hold and xfs_trans_set_sync(I'm not
> > > familiar with xfs yet, so I may have some problems with my code check)...
> > 
> > Yes, I can also see that. But my concern is that this change only
> > addresses the symptom, but leaves the underlying deadlock unsolved.
> > 
> > Indeed, this isn't xfs_trans_commit() I'm worried about here; it's
> > the call to xfs_log_force(mp, XFS_LOG_SYNC) or
> > xfs_log_force_seq(XFS_LOG_SYNC) with a buffer held locked that I'm
> > worried about.
> > 
> > i.e. We have a buffer in the CIL (from a previous transaction) that
> > we currently hold locked while we call xfs_log_force(XFS_LOG_SYNC).
> > If a shutdown occurs while we are waiting for journal IO completion
> > to occur, then xlog_ioend_work() will attempt to lock the buffer and
> > deadlock, right?
> > 
> > e.g. I'm thinking of things like busy extent flushing (hold AGF +
> > AGFL + AG btree blocks locked when we call xfs_log_force()) could
> > also be vulnerable to the same deadlock...
> 
> You mean something like xfs_allocbt_alloc_block(call xfs_log_force to
> flush busy extent which keep agf locked sametime)?
> 
> We call xfs_log_force(mp, XFS_LOG_SYNC) after lock agf and before
> xfs_trans_commit. It seems ok since xfs_buf_item_unpin will not call
> xfs_buf_lock because bli_refcount still keep active(once we hold locked
> agf, the bli_refcount will inc in _xfs_trans_bjoin, and keep it until
> xfs_trans_commit success(clean agf item) or .iop_unpin(dirty agf item,
> call from xlog_ioend_work) which can be called after xfs_trans_commit
> too)...

Again, I gave an example of the class of issue I'm worried about.
Again, you chased the one example given through, but haven't
mentioned a thing about all the other code paths that lead to
xfs_log_force(SYNC) that might hold buffers locked that I didn't
mention.

I don't want to have to ask every person who proposes a fix about
every possible code path the bug may manifest in -one at a time-.  I
use examples to point you in the right direction for further
analysis of the rest of the code base, not because that's the only
thing I want checked. Please use your initiative to look at all the
callers of xfs_log_force(SYNC) and determine if they are all safe or
whether there are landmines lurked or even more bugs of a similar
sort.

When we learn about a new issue, this is the sort of audit work that
is necessary to determine the scope of the issue. We need to perform
such audits because they direct the scope of the fix necessary. We
are not interested in slapping a band-aid fix over the symptom that
was reported - that only leads to more band-aid fixes as the same
issue appears in other places.

Now we know there is a lock ordering problem in this code, so before
we attempt to fix it we need to know how widespread it is, what the
impact is, how different code paths avoid it, etc. That requires a
code audit to determine, and that requires looking at all the paths
into xfs_log_force(XFS_LOG_SYNC) to determine if they are safe or
not and documenting that.

Yes, it's more work *right now* than slapping a quick band-aid fix
over it, but it's much less work in the long run for us and we don't
have to keep playing whack-a-mole because we fixed it the right way
the first time.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx