Re: Deadlocks due to per-process plugging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2012-07-22 at 20:43 +0200, Mike Galbraith wrote: 
> On Sat, 2012-07-21 at 09:47 +0200, Mike Galbraith wrote: 
> > On Wed, 2012-07-18 at 07:30 +0200, Mike Galbraith wrote: 
> > > On Wed, 2012-07-18 at 06:44 +0200, Mike Galbraith wrote:
> > > 
> > > > The patch in question for missing Cc.  Maybe should be only mutex, but I
> > > > see no reason why IO dependency can only possibly exist for mutexes...
> > > 
> > > Well that was easy, box quickly said "nope, mutex only does NOT cut it".
> > 
> > And I also learned (ouch) that both doesn't cut it either.  Ksoftirqd
> > (or sirq-blk) being nailed by q->lock in blk_done_softirq() is.. not
> > particularly wonderful.  As long as that doesn't happen, IO deadlock
> > doesn't happen, troublesome filesystems just work.  If it does happen
> > though, you've instantly got a problem.
> 
> That problem being slab_lock in practice btw, though I suppose it could
> do the same with any number of others.  In encountered case, ksoftirqd
> (or sirq-blk) blocks on slab_lock while holding q->queue_lock, while a
> userspace task (dbench) blocks on q->queue_lock while holding slab_lock
> on the same cpu.  Game over.

Hello vacationing rt wizards' mail boxen (and others so bored they're
actually reading about obscure -rt IO troubles;).

ext4 is still alive, which is a positive sign, and box hasn't yet
deadlocked either, another sign.  Now all I have to do is (sigh) grind
filesystems to fine powder for a few days.. again.

---
 kernel/rtmutex.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -649,7 +649,14 @@ static inline void rt_spin_lock_fastlock
 	if (likely(rt_mutex_cmpxchg(lock, NULL, current)))
 		rt_mutex_deadlock_account_lock(lock, current);
 	else {
-		if (blk_needs_flush_plug(current))
+		/*
+		 * We can't pull the plug if we're already holding a lock
+		 * else we can deadlock.  eg, if we're holding slab_lock,
+		 * ksoftirqd can block while processing BLOCK_SOFTIRQ after
+		 * having acquired q->queue_lock.  If _we_ then block on
+		 * that q->queue_lock while flushing our plug, deadlock.
+		 */
+		if (__migrate_disabled(current) < 2 && blk_needs_flush_plug(current))
 			blk_schedule_flush_plug(current);
 		slowfn(lock);
 	}


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux