fiologparser_hist.py script patch and enhancements?

Kris Davis <Kris.Davis@xxxxxxx> · Tue, 30 Jan 2018 19:13:29 +0000

Greetings all, 

While working with the clat histogram log data, I ran into a few difficulties with the "fiologparser_hist.py" script.  
I've created a patch to address these, but of course need some discussion and review.

The issues: 

1) The fiologparser_hist script didn't support the new nanosecond bin values.  So I changed the operation to assume nanosecond 
histogram bins, and new "--usbin" option to allow user to override so same script can still process older version histogram logs.

2) The script asppeared hardcoded to only return 50% (median), 90%, 95%, and 99% values (along with min and max). 
I added "--percentiles" option to allow a request for more values ('median' always printed, even if a duplicate 50% column is requested, 
for backward compatibility).

3) I use python >3.4 almost exclusively in my environment, and it was relatively simple to alter script so would run with either python 2.7 or 3x . 
I added a check to make sure the python version is at least 2.7 or above, but only actually tested with python 2.7.9 and 3.4.4 
I understand there are some minor python differences between 3.0 and 3.4 that might be an issue, but haven't tried.

4) The process can be slow for large or combining many log files.  I have some automation which will generically process many log files, and found
I cut the process time in half if I loaded as a module rather than calling as a command.  So, changed so can call main directly in a script, but needed to 
slightly alter the end of "guess_max_from_bins" to throw an exception on error rather than exit, when called as a module.  
Someone might know of a better, more conventional pythonic design pattern to use, but it works.

5) The script appears to assume that the log is never missing samples.  That is, weight samples to the requested intervals, I think with the 
assumption that the log samples are at longer intervals, or at least the same interval length as the "--interval" value.  If the workload actually
contains "thinktime" intervals (with missing sample when zero data), the script cannot know this, and assumes the logged operations should still be spread across 
all the intervals.  

In my case, I'm mostly interested in results at the same interval as gathered during logging, so I tweaked into an alternate version I named 'fiologparser_hist_nw.py',
which doesn't perform any weighting of samples.  It has an added advantage of much better performance. For example, fiologparser_hist 
took about 1/2 hr to combine about 350 logs, but fiologparser_hist_nw took 45 seconds, way better for my automation.

Of course, larger number of 9's percentiles would have additional inaccuracies when there are not enough operations in a sample period, 
but that is just user beware.

I've listed a patch below for fiologparser_hist.py.  I'm not sure if can actually "attach" files or would have included zipped copies of both fiologparser_hist.py and fiologparser_hist_nw.py.  But, I could include them in a github issue.

Thanks

Kris

diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
index 2e05b92..fe9a951
--- a/tools/hist/fiologparser_hist.py
+++ b/tools/hist/fiologparser_hist.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python2.7
+#!/usr/bin/python
 """ 
     Utility for converting *_clat_hist* files generated by fio into latency statistics.
     
@@ -16,7 +16,15 @@
 import os
 import sys
 import pandas
+import re
 import numpy as np
+
+runascmd = False
+
+if (sys.version_info < (2, 7)):
+    err("ERROR: Python version = %s; version 2.7 or greater is required.\n")
+    exit(1)
+
 
 err = sys.stderr.write
 
@@ -64,8 +72,20 @@
 def weighted_average(vs, ws):
     return np.sum(vs * ws) / np.sum(ws)
 
-columns = ["end-time", "samples", "min", "avg", "median", "90%", "95%", "99%", "max"]
-percs   = [50, 90, 95, 99]
+
+percs = None
+columns = None
+
+def gen_output_columns(percentiles):
+    global percs,columns
+    strpercs = re.split('[,:]', percentiles)
+    percs = [50.0]  # always print 50% in 'median' column
+    percs.extend(list(map(float,strpercs)))
+    columns = ["end-time", "samples", "min", "avg", "median"]
+    columns.extend(list(map(lambda x: x+'%', strpercs)))
+    columns.append("max")
+        
+
 
 def fmt_float_list(ctx, num=1):
   """ Return a comma separated list of float formatters to the required number
@@ -118,6 +138,7 @@
 
     # Initial histograms from disk:
     arrs = {fp: read_chunk(rdr, sz) for fp,rdr in rdrs.items()}
+    
     while True:
 
         try:
@@ -177,8 +198,12 @@
 
     avg = weighted_average(vs, ws)
     values = [mn, avg] + list(ps) + [mx]
-    row = [end, ss_cnt] + map(lambda x: float(x) / ctx.divisor, values)
-    fmt = "%d, %d, %d, " + fmt_float_list(ctx, 5) + ", %d"
+    row = [end, ss_cnt] + list(map(lambda x: float(x) / ctx.divisor, values))
+    if ctx.divisor > 1:
+        fmt = "%d, %d, " + fmt_float_list(ctx, len(percs)+3)
+    else:
+        # max and min are decimal values if no divisor
+        fmt = "%d, %d, %d, " + fmt_float_list(ctx, len(percs)+1) + ", %d"
     print (fmt % tuple(row))
 
 def update_extreme(val, fncn, new_val):
@@ -207,7 +232,7 @@
             
         # Only look at bins of the current histogram sample which
         # started before the end of the current time interval [start,end]
-        start_times = (end_time - 0.5 * ctx.interval) - bin_vals / 1000.0
+        start_times = (end_time - 0.5 * ctx.interval) - bin_vals / ctx.time_divisor
         idx = np.where(start_times < iEnd)
         s_ts, l_bvs, u_bvs, hs = start_times[idx], lower_bin_vals[idx], upper_bin_vals[idx], hist[idx]
 
@@ -241,7 +266,7 @@
     idx = np.where(arr == hist_cols)
     if len(idx[1]) == 0:
         table = repr(arr.astype(int)).replace('-10', 'N/A').replace('array','     ')
-        err("Unable to determine bin values from input clat_hist files. Namely \n"
+        errmsg = ("Unable to determine bin values from input clat_hist files. Namely \n"
             "the first line of file '%s' " % ctx.FILE[0] + "has %d \n" % (__TOTAL_COLUMNS,) +
             "columns of which we assume %d " % (hist_cols,) + "correspond to histogram bins. \n"
             "This number needs to be equal to one of the following numbers:\n\n"
@@ -250,7 +275,12 @@
             "  - Input file(s) does not contain histograms.\n"
             "  - You recompiled fio with a different GROUP_NR. If so please specify this\n"
             "    new GROUP_NR on the command line with --group_nr\n")
-        exit(1)
+        if runascmd:
+            err(errmsg)
+            exit(1)
+        else:
+            raise RuntimeError(errmsg) 
+        
     return bins[idx[1][0]]
 
 def main(ctx):
@@ -274,9 +304,18 @@
                         ctx.interval = int(hist_msec)
                 except NoOptionError:
                     pass
+    
+    if not hasattr(ctx, 'percentiles'):
+        ctx.percentiles = "90,95,99"
+    gen_output_columns(ctx.percentiles)
 
     if ctx.interval is None:
         ctx.interval = 1000
+
+    if ctx.usbin:
+        ctx.time_divisor = 1000.0        # bins are in us
+    else:
+        ctx.time_divisor = 1000000.0     # bins are in ns
 
     # Automatically detect how many columns are in the input files,
     # calculate the corresponding 'coarseness' parameter used to generate
@@ -288,9 +327,9 @@
 
         max_cols = guess_max_from_bins(ctx, __HIST_COLUMNS)
         coarseness = int(np.log2(float(max_cols) / __HIST_COLUMNS))
-        bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness), np.arange(__HIST_COLUMNS)), dtype=float)
-        lower_bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness, 0.0), np.arange(__HIST_COLUMNS)), dtype=float)
-        upper_bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness, 1.0), np.arange(__HIST_COLUMNS)), dtype=float)
+        bin_vals = np.array(list(map(lambda x: plat_idx_to_val_coarse(x, coarseness), np.arange(__HIST_COLUMNS))), dtype=float)
+        lower_bin_vals = np.array(list(map(lambda x: plat_idx_to_val_coarse(x, coarseness, 0.0), np.arange(__HIST_COLUMNS))), dtype=float)
+        upper_bin_vals = np.array(list(map(lambda x: plat_idx_to_val_coarse(x, coarseness, 1.0), np.arange(__HIST_COLUMNS))), dtype=float)
 
     fps = [open(f, 'r') for f in ctx.FILE]
     gen = histogram_generator(ctx, fps, ctx.buff_size)
@@ -304,7 +343,7 @@
         while more_data or len(arr) > 0:
             
             # Read up to ctx.max_latency (default 20 seconds) of data from end of current interval.
-            while len(arr) == 0 or arr[-1][0] < ctx.max_latency * 1000 + end:
+            while len(arr) == 0 or arr[-1][0] < ctx.max_latency + end:
                 try:
                     new_arr = next(gen)
                 except StopIteration:
@@ -338,6 +377,7 @@
 
 if __name__ == '__main__':
     import argparse
+    runascmd = True
     p = argparse.ArgumentParser()
     arg = p.add_argument
     arg("FILE", help='space separated list of latency log filenames', nargs='+')
@@ -384,5 +424,18 @@
              'given histogram files. Useful for auto-detecting --log_hist_msec and '
              '--log_unix_epoch (in fio) values.')
 
+    arg('--percentiles',
+        default="90,95,99",
+        type=str,
+        help='Optional argument of comma or colon separated percentiles to print. '
+             'The default is "90.0,95.0,99.0".  min, median(50%%) and max percentiles are always printed')
+    
+    arg('--usbin',
+        default=False,
+        action='store_true',
+        help='histogram bin latencies are in us (fio versions < 2.99. fio uses ns for version >= 2.99')
+    
+    
+
     main(p.parse_args())

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html