On Sat, 18 Nov 2017, Mike Galbraith wrote: > On Fri, 2017-11-17 at 15:57 +0100, Sebastian Siewior wrote: > > On 2017-11-13 12:56:53 [-0500], Mikulas Patocka wrote: > > > Hi > > Hi, > > > > > I'm submitting this patch for the CONFIG_PREEMPT_RT patch. It fixes > > > deadlocks in device mapper when real time preemption is used. > > > > applied, thank you. > > Below is my 2012 3.0-rt version of that for reference; at that time we > were using slab, and slab_lock referenced below was a local_lock. The > comment came from crash analysis of a deadlock I met before adding the > (yeah, hacky) __migrate_disabled() blocker. At the time, that was not > an optional hack, you WOULD deadlock if you ground disks to fine powder > the way SUSE QA did. Pulling the plug before blocking cured the > xfs/ext[34] IO deadlocks they griped about, but I had to add that hack > to not trade their nasty IO deadlocks for the more mundane variety. So > my question is: are we sure that in the here and now flush won't want > any lock we may be holding? In days of yore, it most definitely would > turn your box into a doorstop if you tried hard enough. I think that you shouldn't call blk_schedule_flush_plug when attempting to take a rt-spinlock at all. Rt-mutex can't depend on rt-spinlock (altough they use the same internal implementation), so a task waiting on rt-spinlock can't block any other task attempting to take rt-mutex from making progress. Is there some specific scenario where you need to call blk_schedule_flush_plug from rt_spin_lock_fastlock? Mikulas > Subject: rt: pull your plug before blocking > Date: Wed Jul 18 14:43:15 CEST 2012 > From: Mike Galbraith <efault@xxxxxx> > > Queued IO can lead to IO deadlock should a task require wakeup from a task > which is blocked on that queued IO. > > ext3: dbench1 queues a buffer, blocks on journal mutex, it's plug is not > pulled. dbench2 mutex owner is waiting for kjournald, who is waiting for > the buffer queued by dbench1. Game over. > > Signed-off-by: Mike Galbraith <efault@xxxxxx> > --- > kernel/locking/rtmutex.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -22,6 +22,7 @@ > #include <linux/sched/deadline.h> > #include <linux/timer.h> > #include <linux/ww_mutex.h> > +#include <linux/blkdev.h> > > #include "rtmutex_common.h" > > @@ -930,8 +931,18 @@ static inline void rt_spin_lock_fastlock > > if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) > rt_mutex_deadlock_account_lock(lock, current); > - else > + else { > + /* > + * We can't pull the plug if we're already holding a lock > + * else we can deadlock. eg, if we're holding slab_lock, > + * ksoftirqd can block while processing BLOCK_SOFTIRQ after > + * having acquired q->queue_lock. If _we_ then block on > + * that q->queue_lock while flushing our plug, deadlock. > + */ > + if (__migrate_disabled(current) < 2 && blk_needs_flush_plug(current)) > + blk_schedule_flush_plug(current); > slowfn(lock); > + } > } > > static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock, > @@ -1892,9 +1903,12 @@ rt_mutex_fastlock(struct rt_mutex *lock, > if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) { > rt_mutex_deadlock_account_lock(lock, current); > return 0; > - } else > + } else { > + if (blk_needs_flush_plug(current)) > + blk_schedule_flush_plug(current); > return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK, > ww_ctx); > + } > } > > static inline int >