On 3/9/18 7:40 AM, Sitsofe Wheeler wrote: > On 8 March 2018 at 16:18, Jens Axboe <axboe@xxxxxxxxx> wrote: >> Does the below patch make a difference? >> >> >> diff --git a/mutex.c b/mutex.c >> index 63229eda09d6..acc88dc33b98 100644 >> --- a/mutex.c >> +++ b/mutex.c >> @@ -240,10 +240,11 @@ void fio_mutex_up(struct fio_mutex *mutex) >> if (!mutex->value && mutex->waiters) >> do_wake = 1; >> mutex->value++; >> - pthread_mutex_unlock(&mutex->lock); >> >> if (do_wake) >> pthread_cond_signal(&mutex->cond); >> + >> + pthread_mutex_unlock(&mutex->lock); >> } >> >> void fio_rwlock_write(struct fio_rwlock *lock) > > It pains me to say this (because POSIX says such rejigging just > changes the scheduling order) but yes your patch makes a difference. > The following job would trigger the deadlock problem within 10 minutes > for me: In some implementations it's actually mandated to have the wakeup within the lock, which seems to be the case here. It's a shame, since it's clearly suboptimal (from a scalability point of view) to have to hold the lock while issuing a wakeup for a process that's going to grab the same lock. I'll commit the patch. Thanks a lot for all your hard work on this one, let's hope that was it... -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html