* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Thu, Jul 31, 2014 at 04:37:29PM +0400, Ilya Dryomov wrote: > > > This didn't make sense to me at first too, and I'll be happy to be > > proven wrong, but we can reproduce this with rbd very reliably under > > higher than usual load, and the revert makes it go away. What we are > > seeing in the rbd scenario is the following. > > This is drivers/block/rbd.c ? I can find but a single mutex_lock() in > there. > > > Suppose foo needs mutexes A and B, bar needs mutex B. foo acquires > > A and then wants to acquire B, but B is held by bar. foo spins > > a little and ends up calling schedule_preempt_disabled() on line 484 > > above, but that call never returns, even though a hundred usecs later > > bar releases B. foo ends up stuck in mutex_lock() indefinitely, but > > still holds A and everybody else who needs A gets behind A. Given that > > this A happens to be a central libceph mutex all rbd activity halts. > > Deadlock may not be the best term for this, but never returning from > > mutex_lock(&B) even though B has been unlocked is *a* problem. > > > > This obviously doesn't happen every time schedule_preempt_disabled() on > > line 484 is called, so there must be some sort of race here. I'll send > > along the actual rbd stack traces shortly. > > Smells like maybe current->state != TASK_RUNNING, does the below > trigger? > > If so, you've wrecked something in whatever... > > --- > kernel/locking/mutex.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index ae712b25e492..3d726fdaa764 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -473,8 +473,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > * reschedule now, before we try-lock the mutex. This avoids getting > * scheduled out right after we obtained the mutex. > */ > - if (need_resched()) > + if (need_resched()) { > + if (WARN_ON_ONCE(current->state != TASK_RUNNING)) > + __set_current_state(TASK_RUNNING); > + > schedule_preempt_disabled(); > + } Might make sense to add that debug check under mutex debugging or so, with a sensible kernel message printed. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html