On 8 March 2018 at 16:18, Jens Axboe <axboe@xxxxxxxxx> wrote: > Does the below patch make a difference? > > > diff --git a/mutex.c b/mutex.c > index 63229eda09d6..acc88dc33b98 100644 > --- a/mutex.c > +++ b/mutex.c > @@ -240,10 +240,11 @@ void fio_mutex_up(struct fio_mutex *mutex) > if (!mutex->value && mutex->waiters) > do_wake = 1; > mutex->value++; > - pthread_mutex_unlock(&mutex->lock); > > if (do_wake) > pthread_cond_signal(&mutex->cond); > + > + pthread_mutex_unlock(&mutex->lock); > } > > void fio_rwlock_write(struct fio_rwlock *lock) It pains me to say this (because POSIX says such rejigging just changes the scheduling order) but yes your patch makes a difference. The following job would trigger the deadlock problem within 10 minutes for me: [global] thread ioengine=windowsaio direct=1 iodepth=1 readwrite=read time_based=1 filesize=1M runtime=4 time_based numjobs=500 group_reporting [job1] filename=file1:file2:file3:file4:file5:file6:file7:file8:file9:file10:file11:file12:file13:file14:file15 for i in {1..150}; do echo "Loop $i; $(date)"; ./fio --minimal hang.fio; if [[ $? -ne 0 ]]; then echo "failure"; break; fi; done The backtrace is always a before - most threads are waiting trying to lock the mutex, a few are still doing IO completion port work in windowsaio but just keep timing out and don't get events, a couple are waiting in pthread_cond_wait() and the thread that is trying to send pthread_cond_signal() is deadlocked. Not using direct=1 seems to make the problem go away for some reason and having less threads also makes the problem harder to hit. With your patch the above seemed to be able to go to 150 loops without issue but I haven't tested in anger. -- Sitsofe | http://sucs.org/~sits/ -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html