Re: fio server errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14 June 2017 at 23:20, Jeff Furlong <jeff.furlong@xxxxxxx> wrote:
> On issue one (inflate error), I've gone through some of the code and I think nr_samples should be declared as uint64_t, not uint32_t, at https://github.com/axboe/fio/blob/master/client.c#L1451  I've changed it on my machine and reran the prior test but after 25 hours the same issue occurred again.  It's still difficult to say what type of job is causing the issue.

I see there is another uint32_t nr_samples over in
https://github.com/axboe/fio/blob/master/iolog.c#L801 - do you think
that would need changing too? Is nr_samples slowly growing over time
and have you managed to print its value at the point of the inflate
error?

> On issue two (SEND_ETA), I can reproduce with a slightly smaller job file:
>
> #fio --client=server1 test.job --client=server2 test.job
>
> # cat test.job
> [test_job]
> ioengine=libaio
> direct=1
> rw=randread
> iodepth=256
> size=100%
> numjobs=1
> bs=4k
> filename=/dev/nvme0n1
> group_reporting
> write_lat_log=test_job
> runtime=1m
>
> If I remove write_lat_log=test_job then there is no SEND_ETA issue.  Only when logging (the latency) does this issue occur.  And, if I enable write_lat_log=test_job but reduce runtime=30s, then the issue does not occur.  I must have write_lat_log=test_job and runtime=1m to produce the issue.  I imagine some count must be exceeded or the pause in moving the lat data from memory to file at the end of the job is causing a timing conflict (SEND_ETA msg vs "I've quit" msg).

I can reproduce the problem here using this job:
[sendeta]
ioengine=libaio
direct=1
rw=read
bs=512
filename=/dev/nullb0
write_lat_log=test_job
disable_clat=1
disable_slat=1
iodepth=512
number_ios=16000000

and running
fio --debug=net --client=localhost sendeta.fio --client=127.0.0.1 sendeta.fio

Using number_ios seems to make the job run for as long as needed to
show the problem regardless of what the iodepth is in my case. From
what I can see the client is forced to spend time on the IOLOG op
which in turn makes it unresponsive to other actions:

net      1709  client: got cmd op ETA from 127.0.0.1 (pdu=161)
net      1709  client: got eta tag 0x2442d90, 2
net      1709  client: handle localhost
net      1709  client: got cmd op ETA from localhost (pdu=161)
net      1709  client: got eta tag 0x2442d90, 1
net      1709  client: request eta (2)k][r=423k,w=0 IOPS][eta 27m:09s]
net      1709  client: requested eta tag 0x2442d90
net      1709  client: handle localhost
net      1709  client: got cmd op ETA from localhost (pdu=161)
net      1709  client: got eta tag 0x2442d90, 2
net      1709  client: handle 127.0.0.1
net      1709  client: got cmd op IOLOG from 127.0.0.1 (pdu=290)
net      1709  client: handle 127.0.0.1
net      1709  client: got cmd op IOLOG from 127.0.0.1 (pdu=290)
net      1709  client: handle 127.0.0.1
net      1709  client: got cmd op ETA from 127.0.0.1 (pdu=161)
net      1709  client: got eta tag 0x2442d90, 1
net      1709  client: handle localhost0k][r=387k,w=0 IOPS][eta 00m:00s]
net      1709  client: got cmd op IOLOG from localhost (pdu=11297502)
net      1709  client: request eta (2)
net      1709  client: requested eta tag 0x2442d90
net      1709  client: handle 127.0.0.1
net      1709  client: got cmd op IOLOG from 127.0.0.1 (pdu=11297330)
net      1709  client: handle localhost
net      1709  client: got cmd op IOLOG from localhost (pdu=290)
net      1709  client: request eta (2)
net      1709  client: requested eta tag 0x2449df0
client <127.0.0.1>: timeout on SEND_ETA
client <localhost>: timeout on SEND_ETA
net      1709  client: handle 127.0.0.1
net      1709  client: got cmd op ETA from 127.0.0.1 (pdu=161)
fio: client: unable to find matching tag (2442c80)

-- 
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux