Re: fio hangs with --status-interval

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jens,

You'll be surprised but it did not help :( I used the latest code from
git (fio-2.1.11-10-gae7e, commit ae7e050). Still see the same picture.

I don't know if it helps, but I see this behavior on a machine with
96GB of RAM. So, after buffered writes are over, fio waits for a long
time till all dirty buffers hit the disk. But, even after there is no
more disk activity, fio is still stuck for as long as I don't kill it.

Regarding the number of threads. I do understand where the 3 threads
can come from:

1) Backend thread (sort of a manager)
1) Worker thread(s)
2) Disk stats thread

I my case I defined only one job instance, so I suppose there always
should be only one worker thread. I don't understand how the total
number of threads go to 10 in the end.

<snip starts>
$ ps -eLf | grep fio
root      4427  4135  4427  0   15 07:44 pts/1    00:00:02 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4636  0   15 07:56 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4637  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4638  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4647  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4650  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4651  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4652  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4653  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4654  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4663  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4664  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4666  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4668  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4669  0   15 07:59 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
<snip ends>

Thanks,
Vasily

On Fri, Jul 25, 2014 at 3:56 AM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 2014-07-25 09:43, Jens Axboe wrote:
>>
>> On 2014-07-21 22:25, Vasily Tarasov wrote:
>>>
>>> Hi Jens,
>>>
>>> I tried your patch, but it didn't help. Interestingly, the number of
>>> threads changes in the end. At first, during the run:
>>>
>>> # ps -eLf | grep fio
>>> root      5224  4274  5224  1    2 11:12 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5231  5224  5231 60    1 11:12 ?        00:00:07 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5260  5237  5260  0    1 11:12 pts/0    00:00:00 grep fio
>>> [root@bison01 vass]# ps -eLf | grep fio
>>> root      5224  4274  5224  0    2 11:12 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5231  5224  5231 16    1 11:12 ?        00:00:21 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5293  5237  5293  0    1 11:14 pts/0    00:00:00 grep fio
>>> [root@bison01 vass]# ps -eLf | grep fio
>>> root      5224  4274  5224  0    2 11:12 pts/1    00:00:01 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5231  5224  5231 12    1 11:12 ?        00:01:13 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5411  5237  5411  0    1 11:22 pts/0    00:00:00 grep fio
>>>
>>> Later, when the threads are stuck:
>>>
>>> # ps -eLf | grep fio
>>> root      5224  4274  5224  0   16 11:12 pts/1    00:00:02 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5225  0   16 11:12 pts/1    00:00:01 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5458  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5459  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5460  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5461  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5462  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5471  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5472  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5475  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5476  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5477  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5478  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5487  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5488  0   16 11:27 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5489  0   16 11:27 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      6665  5237  6665  0    1 13:21 pts/0    00:00:00 grep fio
>>>
>>> Is the number of threads supposed to change?..
>>
>>
>> Never answered this one... Yes, it'll change, since when you run the
>> job, you'll have one backend process, a number of IO workers, and one
>> disk util thread typically. When you get stuck, it's the backend that is
>> left waiting for that mutex.
>>
>> In any case, I haven't been able to figure this one out yet. But it
>> should be safe enough to just ignore the stat mutex for the final
>> output, since the threads otherwise accessing it are gone. Can you see
>> if this one makes the issue go away?
>
>
> Patch was not compiled, was missing the non-static __show_run_stats(). But
> just pull current -git, I have committed a variant that does compile :-)
>
> --
> Jens Axboe
>
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux