fio server/client disconnect bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,
With the near latest fio version (fio-3.3-51-gf2cd) using server/client mode, I occasionally see one client get disconnected early.  Hence, the client's IO summary output does not get reported at the end of the job.  The issue seems to occur between 100 and 300 iterations of a job file.  Worse, the conditions for the early disconnect may be tied to the complexity of the job file (such as numjobs or IOPs of device).  Even worse, when using debug mode, the failure seems to reduce to around 1000 iterations of a job file.

When running with

fio --client=host1 test_job_a --client=host2 test_job_b --debug=process,net

...
net      5027  client: handle host2
net      5027  client: got cmd op IOLOG from host2 (pdu=446)
net      5027  client: handle host2
client: host=host2 disconnected
net      5027  client: removed <host2>
net      5027  client: request eta (1)
net      5027  client: requested eta tag 0x1b52c20
net      5027  client: handle host1
net      5027  client: got cmd op TEXT from host1 (pdu=85)
<host1> net      5028  server: got op [SEND_ETA], pdu=0, tag=1b4a970
net      5027  client: handle host1
net      5027  client: got cmd op TEXT from host1 (pdu=61)
<host1> net      5028  server sending status
...

Then the normal output summary only shows the statistics from host1.  The host2 throughput, latency, etc. are never displayed.  However, the iops, bw, and lat logs all seem to have been updated properly.  Sometimes host2 is disconnected early; sometimes host1 is disconnected early.

Why might host2 be disconnected?  I see disconnects when using a switch with one hop and also with directly connecting host1 to host2.  So dropping network packets seems unlikely.  Could the ETA update not be accurate?  Or is it possible host2 finishes the job faster than host1 and closes the connection too early?

In the event host2's job file finishes early, should we still summarize the IO traffic in the output?  How is that condition handled?

If you have suggestions on other debug options, I would appreciate.  Thanks.

Regards,
Jeff


--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux