Re: bug in logging code?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/02/2014 01:55 PM, Jens Axboe wrote:
On 04/02/2014 12:24 PM, Mark Nelson wrote:
On 04/02/2014 12:55 PM, Jens Axboe wrote:
On 04/02/2014 11:53 AM, Mark Nelson wrote:
On 04/02/2014 12:38 PM, Jens Axboe wrote:
On 04/02/2014 09:23 AM, Mark Nelson wrote:
I've been playing around with usng the various write_*_log options in
fio to record run statistics.  What I'm finding is that some of the
logs
have malformed lines in them.  The following snippet is from the bw
log:

811820, 12263, 0, 4194304
812619, 35929, 0, 4194304
818437, 8448, 0, 419430, 4194304
1256299, 160627, 0, 4194304
1256831, 154274, 0, 4194304

Notice that the malformed line appears to be a partial write that
gets
truncated.  The fio command being used to run the tests is
autogenerated, so job files aren't used.  here's the one that created
the above log snippet:

/home/nhm/bin/fio
--rw=read
--ioengine=libaio
--runtime=7200
--ramp_time=0
--numjobs=1
--direct=1
--bs=4194304B
--iodepth=16
--size=9216M
--write_iops_log=/tmp/cbt/ceph/kvmrbdfio/osd_ra-00004096/client_ra-00000128/op_size-04194304/concurrent_procs-008/iodepth-016/read/output






--write_bw_log=/tmp/cbt/ceph/kvmrbdfio/osd_ra-00004096/client_ra-00000128/op_size-04194304/concurrent_procs-008/iodepth-016/read/output






--write_lat_log=/tmp/cbt/ceph/kvmrbdfio/osd_ra-00004096/client_ra-00000128/op_size-04194304/concurrent_procs-008/iodepth-016/read/output






--time_based
--name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-0
--name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-1
--name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-2
--name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-3
--name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-4
--name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-5
--name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-6
--name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-7

The --numjobs=1 line should be removed, but I believe it gets
overridden
anyway with multiple name parameters.  I wonder if having multiple
name
parameters (ie multiple fio processes) is causing those processes to
all
try to write to the same log files, potentially at the same time?
Any
thoughts would be much appreicated!

What version of fio are you running?


f42557f (ie master from yesterday) with a slightly different fix for
the
basename issues that just got resolved.

I can verify now that running fio with a single --name flag appears to
generate logs without malformated lines.

OK, I'll take a look, We should already be serializing the writing to
the same log files, but your example definitely looks like we are not.

BTW, the numjobs option applies to each --name section. Just to clear
that up from your original email. So you really get numjobs *
name_sections running.

Aha, thanks for clearing that up!

Is github the right place to create new issues?  If so, want me to
create one for this?

Sure, you can go ahead and do that.

Actually, I'm going to wait after hearing your response below!


I'm puzzled by this report. The writeout of the logs is fully
serialized, there should be no interleaving. Looking at your output,
it's definitely either the output from two jobs, or mangled output from
just one job. The corrupted line is at time 818437, and the next line is
much further ahead. Normally you would expect to see:

500, some-bw, 0, blocksize        <- job X
1000, some-bw, 0, blocksize
[...]
end-time, some-bw, 0, blocksize        <- last entry from job X
500, some-bw, 0, blocksize        <- job Y now starts

Note that the log writing always appends. Did you possibly have this
half-done files in that location from a previous run?

I think this is possible. My tool is supposed to clean out old tmp directories but it was broken (and unnoticed because all other output files were getting overwritten each time). Guess I should prioritize a long-standing plan to use UUIDs for all tests/directories to make this kind of bug more obvious. :)

Mark



--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux