Re: Windows: FIO randomly hangs using attached script

Jens Axboe <axboe@xxxxxxxxx> · Fri, 9 Mar 2018 07:55:32 -0700

On 3/9/18 7:40 AM, Sitsofe Wheeler wrote:
> On 8 March 2018 at 16:18, Jens Axboe <axboe@xxxxxxxxx> wrote:
>> Does the below patch make a difference?
>>
>>
>> diff --git a/mutex.c b/mutex.c
>> index 63229eda09d6..acc88dc33b98 100644
>> --- a/mutex.c
>> +++ b/mutex.c
>> @@ -240,10 +240,11 @@ void fio_mutex_up(struct fio_mutex *mutex)
>>         if (!mutex->value && mutex->waiters)
>>                 do_wake = 1;
>>         mutex->value++;
>> -       pthread_mutex_unlock(&mutex->lock);
>>
>>         if (do_wake)
>>                 pthread_cond_signal(&mutex->cond);
>> +
>> +       pthread_mutex_unlock(&mutex->lock);
>>  }
>>
>>  void fio_rwlock_write(struct fio_rwlock *lock)
> 
> It pains me to say this (because POSIX says such rejigging just
> changes the scheduling order) but yes your patch makes a difference.
> The following job would trigger the deadlock problem within 10 minutes
> for me:

In some implementations it's actually mandated to have the wakeup
within the lock, which seems to be the case here. It's a shame,
since it's clearly suboptimal (from a scalability point of view)
to have to hold the lock while issuing a wakeup for a process
that's going to grab the same lock.

I'll commit the patch. Thanks a lot for all your hard work on this
one, let's hope that was it...

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html