Here's the failing case where host1 does not print: back=1, groupid=0, nr_ts=1 client: got cmd op PROBE from host2 (pdu=112) client host2 removed 2 client: got cmd op PROBE from host1 (pdu=112) client host1 removed 2 client: got cmd op START from host2 (pdu=8) client host2 removed 2 client: got cmd op START from host1 (pdu=8) client host1 removed 2 client: got cmd op ADD_JOB from host2 (pdu=23204) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=70) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=90) client host2 removed 2 client: got cmd op ADD_JOB from host1 (pdu=23204) client host1 removed 2 client: got cmd op TEXT from host2 (pdu=51) client host2 removed 2 client: got cmd op TEXT from host1 (pdu=70) client host1 removed 2 client: got cmd op START from host2 (pdu=8) client host2 removed 2 client: got cmd op TEXT from host1 (pdu=90) client host1 removed 2 client: got cmd op TEXT from host1 (pdu=51) client host1 removed 2 client: got cmd op START from host1 (pdu=8) client host1 removed 2 client: got cmd op TEXT from host2 (pdu=33) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=33) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=25) client host2 removed 2 client: got cmd op TEXT from host1 (pdu=33) client host1 removed 2 client: got cmd op TEXT from host1 (pdu=33) client host1 removed 2 client: got cmd op TEXT from host1 (pdu=25) client host1 removed 2 client: got cmd op SERVER_START from host2 (pdu=0) client host2 removed 2 client: got cmd op SERVER_START from host1 (pdu=0) client host1 removed 2 client: got cmd op ETA from host2 (pdu=165) client host2 removed 2 client: got cmd op ETA from host1 (pdu=165) client host1 removed 2 client: got cmd op ETA from host2 (pdu=165) client host2 removed 2 client: got cmd op ETA from host1 (pdu=165) client host1 removed 2 client: got cmd op ETA from host2 (pdu=165) client host2 removed 2 client: got cmd op ETA from host1 (pdu=165) client host1 removed 25.0%][r=2667M,w=0k][r=41.7k,w=0 IOPS][eta 00m:02s] client: got cmd op ETA from host2 (pdu=165) client host2 removed 2 client: got cmd op ETA from host1 (pdu=165) client host1 removed 28.9%][r=2661M,w=0k][r=41.6k,w=0 IOPS][eta 00m:01s] client: got cmd op ETA from host2 (pdu=165) client host2 removed 2 client: got cmd op ETA from host1 (pdu=165) client host1 removed 200.0%][r=2669M,w=0k][r=41.7k,w=0 IOPS][eta 00m:00s] client: got cmd op IOLOG from host2 (pdu=338) client host2 removed 2 client: got cmd op IOLOG from host2 (pdu=327) client host2 removed 2 client: got cmd op IOLOG from host2 (pdu=337) client host2 removed 2 client: got cmd op IOLOG from host1 (pdu=338) client host1 removed 2 client: got cmd op IOLOG from host1 (pdu=325) client host1 removed 2 client: got cmd op IOLOG from host1 (pdu=337) client host1 removed 2 client: got cmd op IOLOG from host1 (pdu=323) client host1 removed 2 client: got cmd op IOLOG from host1 (pdu=320) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op STOP from host1 (pdu=8) client host1 removed 2 client: got cmd op QUIT from host1 (pdu=0) client host1 removed 2 client host1 removed 1 client: got cmd op IOLOG from host2 (pdu=331) client host2 removed 2 client: got cmd op IOLOG from host2 (pdu=327) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op TS from host2 (pdu=94888) client host2 removed 2 client: got cmd op GS from host2 (pdu=164) client host2 removed 2 client: got cmd op DISK_UTIL from host2 (pdu=256) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=25) client host2 removed 2 client: got cmd op STOP from host2 (pdu=8) client host2 removed 2 client: got cmd op QUIT from host2 (pdu=0) client host2 removed 2 client host2 removed 1 And here's a good case: client: got cmd op PROBE from host2 (pdu=112) client host2 removed 2 client: got cmd op PROBE from host1 (pdu=112) client host1 removed 2 client: got cmd op START from host2 (pdu=8) client host2 removed 2 client: got cmd op START from host1 (pdu=8) client host1 removed 2 client: got cmd op ADD_JOB from host2 (pdu=23204) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=70) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=90) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=51) client host2 removed 2 client: got cmd op START from host2 (pdu=8) client host2 removed 2 client: got cmd op ADD_JOB from host1 (pdu=23204) client host1 removed 2 client: got cmd op TEXT from host1 (pdu=70) client host1 removed 2 client: got cmd op TEXT from host1 (pdu=90) client host1 removed 2 client: got cmd op TEXT from host1 (pdu=51) client host1 removed 2 client: got cmd op START from host1 (pdu=8) client host1 removed 2 client: got cmd op TEXT from host2 (pdu=33) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=33) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=25) client host2 removed 2 client: got cmd op SERVER_START from host2 (pdu=0) client host2 removed 2 client: got cmd op TEXT from host1 (pdu=33) client host1 removed 2 client: got cmd op TEXT from host1 (pdu=33) client host1 removed 2 client: got cmd op TEXT from host1 (pdu=25) client host1 removed 2 client: got cmd op SERVER_START from host1 (pdu=0) client host1 removed 2 client: got cmd op ETA from host2 (pdu=165) client host2 removed 2 client: got cmd op ETA from host1 (pdu=165) client host1 removed 2 client: got cmd op ETA from host2 (pdu=165) client host2 removed 2 client: got cmd op ETA from host1 (pdu=165) client host1 removed 2 client: got cmd op ETA from host2 (pdu=165) client host2 removed 2 client: got cmd op ETA from host1 (pdu=165) client host1 removed 25.0%][r=2667M,w=0k][r=41.7k,w=0 IOPS][eta 00m:02s] client: got cmd op ETA from host2 (pdu=165) client host2 removed 2 client: got cmd op ETA from host1 (pdu=165) client host1 removed 28.9%][r=2669M,w=0k][r=41.7k,w=0 IOPS][eta 00m:01s] client: got cmd op ETA from host2 (pdu=165) client host2 removed 2 client: got cmd op ETA from host1 (pdu=165) client host1 removed 200.0%][r=2675M,w=0k][r=41.8k,w=0 IOPS][eta 00m:00s] client: got cmd op IOLOG from host2 (pdu=336) client host2 removed 2 client: got cmd op IOLOG from host2 (pdu=327) client host2 removed 2 client: got cmd op IOLOG from host2 (pdu=337) client host2 removed 2 client: got cmd op IOLOG from host1 (pdu=338) client host1 removed 2 client: got cmd op IOLOG from host1 (pdu=327) client host1 removed 2 client: got cmd op IOLOG from host1 (pdu=338) client host1 removed 2 client: got cmd op IOLOG from host2 (pdu=330) client host2 removed 2 client: got cmd op IOLOG from host2 (pdu=327) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op JOB_OPT from host2 (pdu=200) client host2 removed 2 client: got cmd op TS from host2 (pdu=94888) client host2 removed 2 client: got cmd op GS from host2 (pdu=164) client host2 removed 2 client: got cmd op DISK_UTIL from host2 (pdu=256) client host2 removed 2 client: got cmd op TEXT from host2 (pdu=25) client host2 removed 2 client: got cmd op STOP from host2 (pdu=8) client host2 removed 2 client: got cmd op QUIT from host2 (pdu=0) client host2 removed 2 client host2 removed 1 client: got cmd op IOLOG from host1 (pdu=330) client host1 removed 2 client: got cmd op IOLOG from host1 (pdu=327) back=1, groupid=0, nr_ts=1 client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op JOB_OPT from host1 (pdu=200) client host1 removed 2 client: got cmd op TS from host1 (pdu=94888) client host1 removed 2 client: got cmd op GS from host1 (pdu=164) client host1 removed 2 client: got cmd op DISK_UTIL from host1 (pdu=256) client host1 removed 2 client: got cmd op TEXT from host1 (pdu=25) client host1 removed 2 client: got cmd op STOP from host1 (pdu=8) client host1 removed 2 client: got cmd op QUIT from host1 (pdu=0) client host1 removed 2 client host1 removed 1 Regards, Jeff -----Original Message----- From: Jens Axboe [mailto:axboe@xxxxxxxxx] Sent: Wednesday, March 21, 2018 10:12 AM To: Jeff Furlong <jeff.furlong@xxxxxxx>; fio@xxxxxxxxxxxxxxx Subject: Re: fio server/client disconnect bug On 3/21/18 10:58 AM, Jeff Furlong wrote: > Patch applied to latest git. Failed after 26 iterations. The host1 did not print output to the summary file. > > client: got cmd op PROBE from host2 (pdu=112) > client: got cmd op PROBE from host1 (pdu=112) > client: got cmd op START from host2 (pdu=8) > client: got cmd op START from host1 (pdu=8) > client: got cmd op ADD_JOB from host2 (pdu=23204) > client: got cmd op TEXT from host2 (pdu=70) > client: got cmd op TEXT from host2 (pdu=90) > client: got cmd op TEXT from host2 (pdu=51) > client: got cmd op ADD_JOB from host1 (pdu=23204) > client: got cmd op START from host2 (pdu=8) > client: got cmd op TEXT from host1 (pdu=70) > client: got cmd op TEXT from host1 (pdu=90) > client: got cmd op TEXT from host1 (pdu=51) > client: got cmd op START from host1 (pdu=8) > client: got cmd op TEXT from host2 (pdu=33) > client: got cmd op TEXT from host2 (pdu=33) > client: got cmd op TEXT from host2 (pdu=25) > client: got cmd op TEXT from host1 (pdu=33) > client: got cmd op TEXT from host1 (pdu=33) > client: got cmd op TEXT from host1 (pdu=25) > client: got cmd op SERVER_START from host2 (pdu=0) > client: got cmd op SERVER_START from host1 (pdu=0) > client: got cmd op ETA from host2 (pdu=165) > client: got cmd op ETA from host1 (pdu=165) > client: got cmd op ETA from host2 (pdu=165) > client: got cmd op ETA from host1 (pdu=165) > client: got cmd op ETA from host2 (pdu=165) > client: got cmd op ETA from host1 (pdu=165) > client: got cmd op ETA from host2 (pdu=165)=41.6k,w=0 IOPS][eta > 00m:02s] > client: got cmd op ETA from host1 (pdu=165) > client: got cmd op ETA from host2 (pdu=165)=41.6k,w=0 IOPS][eta > 00m:01s] > client: got cmd op ETA from host1 (pdu=165) > client: got cmd op IOLOG from host2 (pdu=338)41.6k,w=0 IOPS][eta > 00m:00s] > client: got cmd op IOLOG from host2 (pdu=327) > client: got cmd op IOLOG from host2 (pdu=338) > client: got cmd op IOLOG from host1 (pdu=337) > client: got cmd op IOLOG from host1 (pdu=328) > client: got cmd op IOLOG from host1 (pdu=336) > client: got cmd op IOLOG from host2 (pdu=330) > client: got cmd op IOLOG from host2 (pdu=327) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op JOB_OPT from host2 (pdu=200) > client: got cmd op TS from host2 (pdu=94888) > client: got cmd op GS from host2 (pdu=164) > client: got cmd op IOLOG from host1 (pdu=324) > client: got cmd op DISK_UTIL from host2 (pdu=256) > client: got cmd op IOLOG from host1 (pdu=320) > client: got cmd op TEXT from host2 (pdu=25) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op STOP from host2 (pdu=8) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op QUIT from host2 (pdu=0) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op JOB_OPT from host1 (pdu=200) > client: got cmd op STOP from host1 (pdu=8) > client: got cmd op QUIT from host1 (pdu=0) So this is weird, after the JOB_OPT it should have send the gs/ts and so on summary. Which is what triggers the stat display. Can you remove the previous patch, and add this debug patch with some more printf goodness in it? diff --git a/client.c b/client.c index bff0adc0d972..0f740b5f3057 100644 --- a/client.c +++ b/client.c @@ -179,6 +179,8 @@ static struct fio_client *find_client_by_fd(int fd) void fio_put_client(struct fio_client *client) { + printf("client %s removed %d\n", client->hostname, client->refs); + if (--client->refs) return; @@ -1664,7 +1666,7 @@ int fio_handle_client(struct fio_client *client) if (!cmd) return 0; - dprint(FD_NET, "client: got cmd op %s from %s (pdu=%u)\n", + printf("client: got cmd op %s from %s (pdu=%u)\n", fio_server_op(cmd->opcode), client->hostname, cmd->pdu_len); client->last_cmd = cmd->opcode; diff --git a/stat.c b/stat.c index 98ab63893900..c23a3c9ddabc 100644 --- a/stat.c +++ b/stat.c @@ -1882,6 +1882,8 @@ void __show_run_stats(void) json_object_add_value_array(root, "jobs", array); } + printf("back=%d, groupid=%d, nr_ts=%d\n", is_backend, groupid, nr_ts); + if (is_backend) fio_server_send_job_options(&get_global_options()->opt_list, -1U); -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html