Hello again, I've been working on getting FIO running on a FLEX drive and I've been accumulating a laundry list of features that would be nice to have and I've run into one that is necessary. I also found one bug while experimenting with various FIO options that I'll include in the list. Here are the changes I'd appreciate seeing starting with the most desired. I've included 1. --ignore_device_size The current FLEX protocol maps storage to sectors/bytes greater than the reported capacity of the drive. I'd like to run FIO on these out of bounds sectors, but right now I can't because I get the error "you need to specify valid offset=". Would it be possible to add a flag that would let users run IO outside of the reported device capacity? Without this I believe that FIO cannot run on the SMR portions of any FLEX drive. 2. --readwrite=write_pointer_randwrite:offset,io_size:offset,io_size... In write pointer zones, all writes must be at the write pointer, so random IO is not possible. However, it would be useful to run random-like workloads in which random zones are written to sequentially. This would be very similar to the random_distribution=zoned_abs argument except for instead of writing randomly within a zone, it would write to a specific offset and then increment that offset by the amount written, but would stop writing to that zone entirely once the incrementing offset reached the end of the zone. So if I had 3 zones I wanted to write to that were each 128mb long, I could specify something like --readwrite=write_pointer_randwrite:0,128m:128m,128m:256m,128m. It might also make sense to add distribution percentages like in the zoned_abs argument, although I'm not entirely sure what you would do with those percentages when a zone got fully written and thus could not be picked anymore. 3. --readwrite=write_pointer_randrw:offset,io_size:offset,io_size... Additionally, in write pointer zones, all reads must be below the write pointer, so random read IO is restricted. This is why I requested the random_distribution:zoned_abs argument, because that can be used quite well for issuing random reads to write pointer zones. However, I would like to read and write "randomly" to write pointer zones so that I can more easily control the read/write ratio as well as being able to read data that was written during the same FIO run (currently I can use random_distribution:zoned_abs to read randomly from the beginning of the zone up to the write pointer at the beginning of the FIO run, but I cannot read further after FIO increments the write pointer). This workload would write randomly as described above, and read between the offset and the incremented offset. So before any writes went to the zone, you wouldn't be able to read randomly from that zone. 4. Automatic zone detection with the above two readwrite modes I believe this would be quite a bit of work, but it would nice to be able to specify the previous two workload types without specifically specifying the zones. Instead the user could specify offset and size as normal, and additionally specify the zone number (perhaps through a new option or perhaps with an extended syntax in the readwrite option), and FIO would get the zones and randomly perform write pointer legal IO within all the zones specified by the user using offset and size. And if the user specified a drive area that contains non-write pointer zones, FIO would just do normal IO. It might also be possible for me to help with the implementation of this, if that would be something you'd be interested in. 5. Bug report: --random_percentage sequential behaviour It seems that sequential IO is increasing but not sequential when using the --random_percentage option. Running the following FIO job: fio --name=rand_reads_seq_writes --ioengine=libaio --direct=1 --exitall --thread --filename=/dev/sdf --runtime=30 --readwrite=randrw --iodepth=1 --percentage_random=100,0 --norandommap --output-format=terse results in an even distribution of reads as expected and writes that are increasing but not sequential. Here's an example of writes that I am seeing running this job: First 20 writes (sector, sectors written) [(0, 8), (3048, 8), (3056, 8), (3064, 8), (3072, 8), (3080, 8), (3088, 8), (6408, 8), (7000, 8), (13440, 8), (13496, 8), (13648, 8), (13768, 8), (13920, 8), (14288, 8), (14400, 8), (16376, 8), (18824, 8), (18936, 8), (19832, 8)] Here is my environment information: # cat /etc/centos-release CentOS Linux release 7.3.1611 (Core) # uname -r 3.10.0-514.21.1.el7.x86_64 I saw the same behaviour on fio-3.2 and fio-3.2-81-g3e262 which was the newest version I could see as of today. So I see some bursts of sequential writes, but mostly it seems to be skipping around. I've attached a python 3.6 script that will run this workload and collect the IO information using blktrace/blkparse. To run the script, use the -h flag to see usage, but at a minimum you'll need to give the device handle to run on as the first argument. Thank you for your help, and let me know if you decide to add these features or if I need to provide any further information.
import re import subprocess import time import sys import math import argparse arg_parser = argparse.ArgumentParser() arg_parser.add_argument("drive_handle", help = "Drive handle to test") arg_parser.add_argument("-rt", "--runtime", default = 30, help = "Time to run workload") arg_parser.add_argument("-sbp", "--save_block_parse", action = "store_true", help = "Save blockparse output to blkparse_output.txt if flag is set") arg_parser.add_argument("-fp", "--fio_path", default = "fio", help = "The path to the FIO executable to run") args = arg_parser.parse_args() dev_handle = args.drive_handle blktrace = subprocess.Popen(["blktrace", dev_handle, "-o", "-"], stdout = subprocess.PIPE, stderr = subprocess.PIPE) # blktrace needs a little time to get set up time.sleep(1) # Start FIO job fio_string = (args.fio_path + " --name=rand_reads_seq_writes --ioengine=libaio --direct=1 --exitall" " --thread --filename=" + dev_handle + " --runtime=" + str(args.runtime) + " --readwrite=randrw --iodepth=1 --percentage_random=100,0 --norandommap " "--output-format=terse") print("Running " + fio_string) cmd_ret = subprocess.run(fio_string.split(' '), stdout = subprocess.PIPE, stderr = subprocess.PIPE) if cmd_ret.stderr != b"": print("FIO errors:") print(cmd_ret.stderr.decode(sys.stderr.encoding)) print("FIO stdout:") print(cmd_ret.stdout.decode(sys.stdout.encoding)) # Terminate is how blktrace expects to end, don't use kill or you'll lose commands near the end blktrace.terminate() try: stdout, stderr = blktrace.communicate(timeout = 20) except subprocess.TimeoutExpired: blktrace.kill() stdout, stderr = blktrace.communicate() print("blktrace errors:") print(stderr) # This will give you the raw blktrace output # print(stdout) blkparse_format_str = '%D %2c %8s %5T.%9t %5p %2a %3d command = %C sectors = %S block_num = %n\n' blkparse_ret = subprocess.run(["blkparse", "-i", "-", "-f", blkparse_format_str, "-a", "issue"], input = stdout, stdout = subprocess.PIPE, stderr = subprocess.PIPE) print("blkparse errors:") print(blkparse_ret.stderr) # print(blkparse_ret.stdout) blkparse_str = blkparse_ret.stdout.decode(sys.stdout.encoding) if args.save_block_parse: with open("blkparse_output.txt", 'w') as output_file: output_file.write(blkparse_str) # Parse blktrace result into bins blkline_re = re.compile(r"(\d+,\d+)\s+(\d+)\s+(\d+)\s+(?P<timestamp>\d+\.\d+)\s+(\d+)\s+D\s+" r"(?P<type>(R|W)S?)\s+command = fio\s+sectors = (?P<sector>\d+)\s" r"block_num = (?P<block_num>\d+)") total_reads = 0 avg_lba = 0 max_lba = 0 min_lba = None # Parse out the sectors from the blocktrace output to get some preliminary statistics match_iter = blkline_re.finditer(blkparse_str) for match_obj in match_iter: if match_obj.groupdict()["type"][0] == 'R': sector_num = int(match_obj.groupdict()["sector"]) total_reads += 1 avg_lba += sector_num if min_lba is None or sector_num < min_lba: min_lba = sector_num if sector_num > max_lba: max_lba = sector_num print("total reads = " + str(total_reads)) print("avg: {:.2f}, min: {}, max: {}".format(avg_lba / total_reads, min_lba, max_lba)) hist_num = 20 hist_bins = [0] * hist_num hist_div = max_lba / hist_num hist_edges = [] for ind in range(hist_num): hist_edges.append(hist_div * (ind + 1)) # Sort the data into a read histogram and a write list write_list = [] match_iter = blkline_re.finditer(blkparse_str) for match_obj in match_iter: sector_num = int(match_obj.groupdict()["sector"]) if match_obj.groupdict()["type"][0] == 'R': hist_ind = math.floor(sector_num / hist_div) if hist_ind == hist_num: hist_ind -= 1 hist_bins[hist_ind] += 1 # Assume non-reads are writes else: block_num = int(match_obj.groupdict()["block_num"]) write_list.append((sector_num, block_num)) hist_perc = [] for hist_bin in hist_bins: hist_perc.append(100 * hist_bin / total_reads) print("read histogram bins = " + str(hist_bins)) print("read histogram percents = " + str(hist_perc)) print("read histogram edges = " + str(hist_edges)) num_write_print = 20 print("First {} writes (sector, sectors written)".format(num_write_print)) print(write_list[:num_write_print]) # print FIO version cmd_ret = subprocess.run([args.fio_path, "-v"])