Re: xlog_grant_head_wait deadlocks on high-rolling transactions?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 12, 2019 at 11:18:25AM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> Does anyone /else/ occasionally see fstests hang with a hojillion
> threads stuck in xlog_grant_head_wait?  I periodically see xfs/347 hang
> with a hojillion threads stuck in:
> 
> kworker/0:214   D13120 26117      2 0x80000000
> Workqueue: xfs-conv/sdf xfs_end_io [xfs]
> Call Trace:
>  schedule+0x36/0x90
>  xlog_grant_head_wait+0x66/0x450 [xfs]
>  xlog_grant_head_check+0xf0/0x170 [xfs]
>  xfs_log_reserve+0x166/0x500 [xfs]
>  xfs_trans_reserve+0x1ac/0x2b0 [xfs]
>  xfs_trans_alloc+0xda/0x220 [xfs]
>  xfs_reflink_end_cow_extent+0xda/0x3a0 [xfs]
>  xfs_reflink_end_cow+0x92/0x2a0 [xfs]
>  xfs_end_io+0xd0/0x120 [xfs]
>  process_one_work+0x252/0x600
>  worker_thread+0x3d/0x390
>  kthread+0x11f/0x140
>  ret_from_fork+0x24/0x30
> 
> Which is the end io worker stalled under xfs_trans_alloc trying to
> reserve log space to remap extents from the COW fork to the data fork.
> I also observe one thread stuck here:
> 
> kworker/0:215   D13120 26118      2 0x80000000
> Workqueue: xfs-conv/sdf xfs_end_io [xfs]
> Call Trace:
>  schedule+0x36/0x90
>  xlog_grant_head_wait+0x66/0x450 [xfs]
>  xlog_grant_head_check+0xf0/0x170 [xfs]
>  xfs_log_regrant+0x155/0x3b0 [xfs]
>  xfs_trans_reserve+0xa5/0x2b0 [xfs]
>  xfs_trans_roll+0x9c/0x190 [xfs]
>  xfs_defer_trans_roll+0x16e/0x5b0 [xfs]
>  xfs_defer_finish_noroll+0xf1/0x7e0 [xfs]
>  __xfs_trans_commit+0x1c3/0x630 [xfs]
>  xfs_reflink_end_cow_extent+0x285/0x3a0 [xfs]
>  xfs_reflink_end_cow+0x92/0x2a0 [xfs]
>  xfs_end_io+0xd0/0x120 [xfs]
>  process_one_work+0x252/0x600
>  worker_thread+0x3d/0x390
>  kthread+0x11f/0x140
>  ret_from_fork+0x24/0x30
> 
> This thread is stalled under xfs_trans_roll trying to reserve more log
> space because it rolled more times than tr_write.tr_logcount
> anticipated.  logcount = 8, but (having added a patch to trace log
> tickets that roll more than logcount guessed) we actually roll these
> end_cow transactions 10 times.
> 

I've not seen this behavior myself, FWIW.

> I think the problem was introduced when we added the deferred AGFL log
> item, because the bunmapi of the old data fork extent and the map_extent
> of the new extent can both add separate deferred AGFL log items to the
> defer chain.  It's also possible that I underestimated
> XFS_WRITE_LOG_COUNT_REFLINK way back when.
> 

Sounds plausible, but IIRC a rolling transaction isn't necessarily
required to make the entire reservation up front. The initial
reservation (regrant) size is essentially a tradeoff between minimizing
the need to wait on additional log regrants in the common case vs.
having a reasonably sized initial reservation.

Am I following correctly that you are reproducing a hard lockup and not
necessarily stalls or otherwise similar performance problems? If so, is
the problem you're seeing that a long-running rolling transaction is
essentially blocked on acquiring new reservation by new transaction
reservations that have come in since the rolling transaction started,
and then those new transactions are somehow or another blocked on the
rolling transaction(s) that cannot complete?

> Either way, the xfs_trans_roll transaction wants (logres) more space,
> and the xfs_trans_alloc transactions want (logres * logcount) space.
> Unfortunately, the alloc transactions got to the grant waiter list
> first, and there's not enough space for them, so the entire list waits.
> There seems to be enough space to grant the rolling transaction its
> smaller amount of space, so at least in theory that transaction could
> finish (and release a lot of space) if it could be bumped to the head of
> the waiter list.
> 

I have to read through and page in the reservation code to make more
sense of this whole issue, but I think it's an interesting idea to
consider if it's possible to give an in-progress rolling transaction
(that has run out of reservation) priority over new transaction
allocations in the queue. That said, I'm still fuzzy on the underlying
problem and so it's not clear if that's an ideal solution.

> Another way to solve this problem of course is to increase tr_logcount
> from 8 to 10, though this could cause some user heartburn for small
> filesystems because the minimum log size would increase.  However, I'm
> not sure about the relative merits of either approach, so I'm kicking
> this to the list for further input (while I go have lunch :P)
> 

We may want to do that regardless if that's the ideal reservation, but
if there's a deadlock involved I wouldn't necessarily consider that a
fix for the underlying problem. We'd just be avoiding the case where a
rolling transaction needs to acquire more reservation in these reflink
contexts. We'd still be fundamentally susceptible to deadlocks if there
are less common conditions where blocking regrants can occur, right?

Brian

> The second problem I noticed is that the reflink cancel cow and reflink
> remap functions follow the pattern of allocating one transaction and
> rolling it for every extent it encounters.  This results in /very/ high
> roll counts for the transaction, which (on a very busy system with a
> smallish log) seems like it could land us right back in this deadlock.
> I think the answer is to split those up to run one transaction per
> extent (like I did for reflink end_cow), though I'd have to ensure that
> we can drop the ILOCK safely to get a new transaction.
> 
> Thoughts?
> 
> --D



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux