Hi Kris, >From what you're saying it seems to make sense to leave the interpreter line at 2.7 for now. If you can repost your patch with that and/or send a github pull request to Jens' Github repo the rest of the changes sound fine to me. Vincent: any comment on these changes? On 21 February 2018 at 14:41, Kris Davis <Kris.Davis@xxxxxxx> wrote: > After doing some more research, my best guess is that python3.4 being much slower than python2.7 is due to the difference in integer handling, so pypy 3 didn't do any better then cpython 3.4. I don't really have anything to verify that. > > So, given no further feedback, is there any reason not to commit the suggested changes? Is there anything else I need to do? > > Thanks > Kris Davis > > -----Original Message----- > From: Kris Davis > Sent: Wednesday, February 14, 2018 11:51 AM > To: 'Sitsofe Wheeler' <sitsofe@xxxxxxxxx> > Cc: fio@xxxxxxxxxxxxxxx; Jens Axboe <axboe@xxxxxxxxx>; Vincent Fu <vincentfu@xxxxxxxxx> > Subject: RE: fiologparser_hist.py script patch and enhancements? > > I wasn't familiar with pypy. I tried it out, but needed to build locally on my Centos 7.3 machine. However, I found something interesting... > I ran fiologparser_hist.py in multiple trials against 9 log files to be combined. > * with pypy it consistently took about 51-52 seconds. > * with python3.4.3, it also consistently took about 51-52 seconds. > * with python2.7.5 it consistently took 10-11 seconds. > > I also built python 3.4.3 from the distribution locally some months earlier. Sure is suspicious. > But, this is off topic, so I'll investigate further without further updates. Thanks for the reference Sitsofe. > > Kris Davis > > -----Original Message----- > From: Sitsofe Wheeler [mailto:sitsofe@xxxxxxxxx] > Sent: Tuesday, February 13, 2018 1:27 AM > To: Kris Davis <Kris.Davis@xxxxxxx> > Cc: fio@xxxxxxxxxxxxxxx; Jens Axboe <axboe@xxxxxxxxx>; Vincent Fu <vincentfu@xxxxxxxxx> > Subject: Re: fiologparser_hist.py script patch and enhancements? > > (CC'ing Vincent) > > On 12 February 2018 at 21:38, Kris Davis <Kris.Davis@xxxxxxx> wrote: >> In light of some related commits, I am reposting my enhancements to >> fiologparser_hist.py, and a suggested addition of fiologparser_hist.nw.py I've included a patch at bottom. > > It might be an idea to post this up on github too (it might make it easier for others to pull down). > >> Reasons for the changes: >> >> 1) The fiologparser_hist script didn't support the new nanosecond bin values. So I changed the operation to assume nanosecond histogram bins, and new "--usbin" option to allow user to override so same script can still process older version histogram logs. >> >> 2) The script asppeared hardcoded to only return 50% (median), 90%, 95%, and 99% values (along with min and max). >> I added "--percentiles" option to allow a request for more values ('median' always printed, even if a duplicate 50% column is requested, for backward compatibility). > > These sound good. > >> 3) A recent commit made some changes to support python3. >> I added a check to make sure the python version is at least 2.7 or above, and changed the "shbang" to only call out "python" rather than "python2.7" > > Sadly the switch to python2.7 was done on purpose: > https://github.com/axboe/fio/commit/60023ade47e7817db1c18d9b7e511839de5c2c99 > - Linux distros are clamping down on python and macOS doesn't have python2. The whole python interpreter line business is a mess and there's simply no common agreement - if you look you can find conflicting PEPs and I'm starting to think packagers will just have to include a function to rename lines to their preferred style. My hope is one day all the scripts are converted to be both python2 and > python3 compatible, all OSes finally get around to shipping python3 by default and then the interpreter line can be switched. > >> 4) The process can be slow for large or combining many log files. I have some automation which will generically process many log files, and found I cut the process time in half if I loaded as a module rather than calling as a command. So, changed so I can load as a module and call main directly, but needed to slightly alter the end of "guess_max_from_bins" to throw an exception on error rather than exit, when called as a module. >> Someone might know of a better, more conventional pythonic design pattern to use, but it works. > > I've no strong feeling on this. > >> 5) The script appears to assume that the log is never missing samples. That is, weight samples to the requested intervals, I think with the assumption that the log samples are at longer intervals, or at least the same interval length as the "--interval" value. If the workload actually contains "thinktime" intervals (with missing sample when zero data), the script cannot know this, and assumes the logged operations should still be spread across all the intervals. >> >> In my case, I'm mostly interested in results at the same interval as gathered during logging, so I tweaked into an alternate version I named 'fiologparser_hist_nw.py', which doesn't perform any weighting of samples. It has an added advantage of much quicker performance. For example, fiologparser_hist took about 1/2 hr to combine about 350 logs, but fiologparser_hist_nw took 45 seconds, way better for my automation. > > Just out of interest does using pypy help you at all? > >> Of course, larger number of 9's percentiles would have additional inaccuracies when there are not enough operations in a sample period, but that is just user beware. >> >> diff --git a/tools/hist/fiologparser_hist.py >> b/tools/hist/fiologparser_hist.py index 62a4eb4..c77bb11 100755 >> --- a/tools/hist/fiologparser_hist.py >> +++ b/tools/hist/fiologparser_hist.py > > <snip> -- Sitsofe | http://sucs.org/~sits/ -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html