RE: fio server/client disconnect bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patch applied to latest git.  Failed after 26 iterations.  The host1 did not print output to the summary file.

client: got cmd op PROBE from host2 (pdu=112)
client: got cmd op PROBE from host1 (pdu=112)
client: got cmd op START from host2 (pdu=8)
client: got cmd op START from host1 (pdu=8)
client: got cmd op ADD_JOB from host2 (pdu=23204)
client: got cmd op TEXT from host2 (pdu=70)
client: got cmd op TEXT from host2 (pdu=90)
client: got cmd op TEXT from host2 (pdu=51)
client: got cmd op ADD_JOB from host1 (pdu=23204)
client: got cmd op START from host2 (pdu=8)
client: got cmd op TEXT from host1 (pdu=70)
client: got cmd op TEXT from host1 (pdu=90)
client: got cmd op TEXT from host1 (pdu=51)
client: got cmd op START from host1 (pdu=8)
client: got cmd op TEXT from host2 (pdu=33)
client: got cmd op TEXT from host2 (pdu=33)
client: got cmd op TEXT from host2 (pdu=25)
client: got cmd op TEXT from host1 (pdu=33)
client: got cmd op TEXT from host1 (pdu=33)
client: got cmd op TEXT from host1 (pdu=25)
client: got cmd op SERVER_START from host2 (pdu=0)
client: got cmd op SERVER_START from host1 (pdu=0)
client: got cmd op ETA from host2 (pdu=165)
client: got cmd op ETA from host1 (pdu=165)
client: got cmd op ETA from host2 (pdu=165)
client: got cmd op ETA from host1 (pdu=165)
client: got cmd op ETA from host2 (pdu=165)
client: got cmd op ETA from host1 (pdu=165)
client: got cmd op ETA from host2 (pdu=165)=41.6k,w=0 IOPS][eta 00m:02s]
client: got cmd op ETA from host1 (pdu=165)
client: got cmd op ETA from host2 (pdu=165)=41.6k,w=0 IOPS][eta 00m:01s]
client: got cmd op ETA from host1 (pdu=165)
client: got cmd op IOLOG from host2 (pdu=338)41.6k,w=0 IOPS][eta 00m:00s]
client: got cmd op IOLOG from host2 (pdu=327)
client: got cmd op IOLOG from host2 (pdu=338)
client: got cmd op IOLOG from host1 (pdu=337)
client: got cmd op IOLOG from host1 (pdu=328)
client: got cmd op IOLOG from host1 (pdu=336)
client: got cmd op IOLOG from host2 (pdu=330)
client: got cmd op IOLOG from host2 (pdu=327)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op TS from host2 (pdu=94888)
client: got cmd op GS from host2 (pdu=164)
client: got cmd op IOLOG from host1 (pdu=324)
client: got cmd op DISK_UTIL from host2 (pdu=256)
client: got cmd op IOLOG from host1 (pdu=320)
client: got cmd op TEXT from host2 (pdu=25)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op STOP from host2 (pdu=8)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op QUIT from host2 (pdu=0)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op STOP from host1 (pdu=8)
client: got cmd op QUIT from host1 (pdu=0)

Here is a "good" run, but one thing I note is that the host2 prints to the summary file first, then host1 prints, then the "all clients" print.  But I list the clients in the order of host1 then host2 when running the fio cmd.

client: got cmd op PROBE from host2 (pdu=112)
client: got cmd op PROBE from host1 (pdu=112)
client: got cmd op START from host2 (pdu=8)
client: got cmd op START from host1 (pdu=8)
client: got cmd op ADD_JOB from host2 (pdu=23204)
client: got cmd op TEXT from host2 (pdu=70)
client: got cmd op TEXT from host2 (pdu=90)
client: got cmd op TEXT from host2 (pdu=51)
client: got cmd op START from host2 (pdu=8)
client: got cmd op ADD_JOB from host1 (pdu=23204)
client: got cmd op TEXT from host1 (pdu=70)
client: got cmd op TEXT from host1 (pdu=90)
client: got cmd op TEXT from host1 (pdu=51)
client: got cmd op START from host1 (pdu=8)
client: got cmd op TEXT from host2 (pdu=33)
client: got cmd op TEXT from host2 (pdu=33)
client: got cmd op TEXT from host2 (pdu=25)
client: got cmd op SERVER_START from host2 (pdu=0)
client: got cmd op TEXT from host1 (pdu=33)
client: got cmd op TEXT from host1 (pdu=33)
client: got cmd op TEXT from host1 (pdu=25)
client: got cmd op SERVER_START from host1 (pdu=0)
client: got cmd op ETA from host2 (pdu=165)
client: got cmd op ETA from host1 (pdu=165)
client: got cmd op ETA from host2 (pdu=165)
client: got cmd op ETA from host1 (pdu=165)
client: got cmd op ETA from host2 (pdu=165)
client: got cmd op ETA from host1 (pdu=165)
client: got cmd op ETA from host2 (pdu=165)=41.6k,w=0 IOPS][eta 00m:02s]
client: got cmd op ETA from host1 (pdu=165)
client: got cmd op ETA from host2 (pdu=165)=41.8k,w=0 IOPS][eta 00m:01s]
client: got cmd op ETA from host1 (pdu=165)
client: got cmd op IOLOG from host2 (pdu=338)41.8k,w=0 IOPS][eta 00m:00s]
client: got cmd op IOLOG from host2 (pdu=328)
client: got cmd op IOLOG from host2 (pdu=338)
client: got cmd op IOLOG from host1 (pdu=337)
client: got cmd op IOLOG from host1 (pdu=328)
client: got cmd op IOLOG from host1 (pdu=337)
client: got cmd op IOLOG from host2 (pdu=323)
client: got cmd op IOLOG from host2 (pdu=321)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op JOB_OPT from host2 (pdu=200)
client: got cmd op TS from host2 (pdu=94888)
client: got cmd op GS from host2 (pdu=164)
client: got cmd op DISK_UTIL from host2 (pdu=256)
client: got cmd op TEXT from host2 (pdu=25)
client: got cmd op STOP from host2 (pdu=8)
client: got cmd op QUIT from host2 (pdu=0)
client: got cmd op IOLOG from host1 (pdu=323)
client: got cmd op IOLOG from host1 (pdu=321)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op JOB_OPT from host1 (pdu=200)
client: got cmd op TS from host1 (pdu=94888)
client: got cmd op GS from host1 (pdu=164)
client: got cmd op DISK_UTIL from host1 (pdu=256)
client: got cmd op TEXT from host1 (pdu=25)
client: got cmd op STOP from host1 (pdu=8)
client: got cmd op QUIT from host1 (pdu=0)

Regards,
Jeff

-----Original Message-----
From: Jens Axboe [mailto:axboe@xxxxxxxxx] 
Sent: Wednesday, March 21, 2018 9:39 AM
To: Jeff Furlong <jeff.furlong@xxxxxxx>; fio@xxxxxxxxxxxxxxx
Subject: Re: fio server/client disconnect bug

On 3/21/18 10:35 AM, Jeff Furlong wrote:
> No client ends in error.  In fact, I get a full set of iops/lat/bw logs from both clients.  And inspection of those logs looks good; I see the last 1000ms timestamp is valid for the duration of runtime.  But only one client prints the summary info to the output file.

Can you try with this debug patch and see if you can reproduce? Comparing a good and bad run would likely be key in figuring out wtf is going on here.

diff --git a/client.c b/client.c
index bff0adc0d972..76c89a496964 100644
--- a/client.c
+++ b/client.c
@@ -1664,7 +1664,7 @@ int fio_handle_client(struct fio_client *client)
 	if (!cmd)
 		return 0;
 
-	dprint(FD_NET, "client: got cmd op %s from %s (pdu=%u)\n",
+	printf("client: got cmd op %s from %s (pdu=%u)\n",
 		fio_server_op(cmd->opcode), client->hostname, cmd->pdu_len);
 
 	client->last_cmd = cmd->opcode;

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux