Re: Windows: FIO randomly hangs using attached script

Sitsofe Wheeler <sitsofe@xxxxxxxxx> · Fri, 9 Mar 2018 14:40:54 +0000

On 8 March 2018 at 16:18, Jens Axboe <axboe@xxxxxxxxx> wrote:
> Does the below patch make a difference?
>
>
> diff --git a/mutex.c b/mutex.c
> index 63229eda09d6..acc88dc33b98 100644
> --- a/mutex.c
> +++ b/mutex.c
> @@ -240,10 +240,11 @@ void fio_mutex_up(struct fio_mutex *mutex)
>         if (!mutex->value && mutex->waiters)
>                 do_wake = 1;
>         mutex->value++;
> -       pthread_mutex_unlock(&mutex->lock);
>
>         if (do_wake)
>                 pthread_cond_signal(&mutex->cond);
> +
> +       pthread_mutex_unlock(&mutex->lock);
>  }
>
>  void fio_rwlock_write(struct fio_rwlock *lock)

It pains me to say this (because POSIX says such rejigging just
changes the scheduling order) but yes your patch makes a difference.
The following job would trigger the deadlock problem within 10 minutes
for me:

[global]
thread
ioengine=windowsaio
direct=1
iodepth=1
readwrite=read
time_based=1
filesize=1M
runtime=4
time_based
numjobs=500
group_reporting
[job1]
filename=file1:file2:file3:file4:file5:file6:file7:file8:file9:file10:file11:file12:file13:file14:file15

for i in {1..150}; do echo "Loop $i; $(date)"; ./fio --minimal
hang.fio; if [[ $? -ne 0 ]]; then echo "failure"; break; fi; done

The backtrace is always a before - most threads are waiting trying to
lock the mutex, a few are still doing IO completion port work in
windowsaio but just keep timing out and don't get events, a couple are
waiting in pthread_cond_wait() and the thread that is trying to send
pthread_cond_signal() is deadlocked. Not using direct=1 seems to make
the problem go away for some reason and having less threads also makes
the problem harder to hit.

With your patch the above seemed to be able to go to 150 loops without
issue but I haven't tested in anger.

-- 
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html