On Thu, Jul 31, 2014 at 02:16:37PM +0400, Ilya Dryomov wrote: > This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900. > > This commit can lead to deadlocks by way of what at a high level > appears to look like a missing wakeup on mutex_unlock() when > CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship > their kernels. In particular, it causes reproducible deadlocks in > libceph/rbd code under higher than moderate loads with the evidence > actually pointing to the bowels of mutex_lock(). > > kernel/locking/mutex.c, __mutex_lock_common(): > 476 osq_unlock(&lock->osq); > 477 slowpath: > 478 /* > 479 * If we fell out of the spin path because of need_resched(), > 480 * reschedule now, before we try-lock the mutex. This avoids getting > 481 * scheduled out right after we obtained the mutex. > 482 */ > 483 if (need_resched()) > 484 schedule_preempt_disabled(); <-- never returns > 485 #endif > 486 spin_lock_mutex(&lock->wait_lock, flags); > > We started bumping into deadlocks in QA the day our branch has been > rebased onto 3.15 (the release this commit went in) but then as part of > debugging effort I enabled all locking debug options, which also > disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear, > which is why it hasn't been looked into until now. Revert makes the > problem go away, confirmed by our users. This doesn't make sense and you fail to explain how this can possibly deadlock.
Attachment:
pgp_uC68N77u1.pgp
Description: PGP signature