Re: fiologparser.py

Ben England <bengland@xxxxxxxxxx> · Tue, 24 May 2016 16:47:22 -0400 (EDT)

Mark, I didn't notice the sample weighting code before.  Weighting of samples might work for averaging, but it doesn't work for percentiles, min or max provided by -A option.  I guess for min this won't be an issue generally, since min-latency samples will probably fall entirely within a time interval.  But for max or higher percentiles it will *definitely* be an issue.   For example, a really high latency sample could be the max for a whole range of time intervals.   

To compute percentiles, we can sort (by response time) the samples that *overlap the time interval* and then index into the python list something like this (ignoring boundary conditions):

 def get_percentile(list, percentile): 
   return sample_list[len(list) * percentile / 100]

min would be first array element in sample_list, 
max would be last array element in sample_list. 

And I'll definitely try using .sort instead of sorted(), thx Jeff.

make sense?

-ben

----- Original Message -----
> From: "Mark Nelson" <mark.a.nelson@xxxxxxxxx>
> To: "Ben England" <bengland@xxxxxxxxxx>, "Jens Axboe" <axboe@xxxxxxxxx>
> Cc: "Martin Steigerwald" <ms@xxxxxxxxx>, fio@xxxxxxxxxxxxxxx, "Mark Nelson" <mnelson@xxxxxxxxxx>
> Sent: Tuesday, May 24, 2016 12:20:19 PM
> Subject: Re: fiologparser.py
> 
> I've got a version that removes the dependency and appears to return the
> same values:
> 
> https://github.com/axboe/fio/pull/181
> 
> Going through the code though, it looks like the -A values are computed
> differently than in the other original functions.  In the original
> get_contribution function, all samples within the bounds are counted,
> along with samples that are only partially within the bounds.  Each
> sample is weighted based on the duration it overlapped with the sample
> period:
> 
> https://github.com/axboe/fio/blob/master/tools/fiologparser.py#L195-L198
> 
> for -A, only the samples that are totally within the bounds are counted,
> and are weighted equally despite how much of the period was spent in
> that sample:
> 
> https://github.com/axboe/fio/blob/master/tools/fiologparser.py#L173
> 
> Thus if you look at say the average from -a:
> 
> fiologparser.py -a *clat*
> 
> 1000, 11582.770
> 2000, 14033.844
> 3000, 17087.446
> 4000, 17946.245
> 5000, 14554.196
> 6000, 14407.804
> 7000, 15218.106
> 8000, 15157.951
> 
> the results are quite a bit different from -A:
> 
> fiologparser.py -A *clat* | tr -s "," " " | cut -f1,4 -d" "
> 
> 0.000000 11902.719298
> 1000.000000 13247.750000
> 2000.000000 14270.549020
> 3000.000000 15092.192308
> 4000.000000 14127.472727
> 5000.000000 12880.137931
> 6000.000000 15296.735849
> 7000.000000 14857.306122
> 8000.000000 14854.766667
> 
> Mark
> 
> 
> On 05/24/2016 10:35 AM, Ben England wrote:
> > OK we'll remove the dependencies, I still want to have the -A option
> > supported.
> > -ben
> >
> > ----- Original Message -----
> >> From: "Jens Axboe" <axboe@xxxxxxxxx>
> >> To: "Ben England" <bengland@xxxxxxxxxx>, "Mark Nelson"
> >> <mark.a.nelson@xxxxxxxxx>
> >> Cc: "Martin Steigerwald" <ms@xxxxxxxxx>, fio@xxxxxxxxxxxxxxx, "Mark
> >> Nelson" <mnelson@xxxxxxxxxx>
> >> Sent: Tuesday, May 24, 2016 11:28:39 AM
> >> Subject: Re: fiologparser.py
> >>
> >> On 05/24/2016 09:22 AM, Ben England wrote:
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: "Mark Nelson" <mark.a.nelson@xxxxxxxxx>
> >>>> To: "Ben England" <bengland@xxxxxxxxxx>, "Martin Steigerwald"
> >>>> <ms@xxxxxxxxx>
> >>>> Cc: fio@xxxxxxxxxxxxxxx, "Mark Nelson" <mnelson@xxxxxxxxxx>, "Jens
> >>>> Axboe"
> >>>> <axboe@xxxxxxxxx>
> >>>> Sent: Tuesday, May 24, 2016 10:04:14 AM
> >>>> Subject: Re: fiologparser.py
> >>>>
> >>>> Let's see if we can remove the numpy and scipy dependencies.  It looks
> >>>> like we are just using it for min/average/median/max/percentile
> >>>> calculations.  It would be nice if users didn't need anything other than
> >>>> argparse.
> >>>>
> >>>
> >>> Just curious, why is scipy a problem?  Is it because CBT isn't a
> >>> package so you don't get dependencies handled when you install it?  You
> >>> are correct, it's easy to remove the dependencies, I just didn't know it
> >>> was causing problems for people.  You can get percentiles from just
> >>> sorting the sample values and indexing into the array at the appropriate
> >>> offset, I was just trying to re-use existing classes.
> >>
> >> It's not necessarily a problem, but the less dependencies you have, the
> >> easier it is for people to use. I do the same for fio, try to have as
> >> few external dependencies as possible. Remember, not everybody is
> >> running on Linux...
> >>
> >> --
> >> Jens Axboe
> >>
> >>
> 
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html