Re: Problems with interpreting FIO output

Adam Horshack <horshack@xxxxxxxx> · Sun, 19 Feb 2023 17:32:29 +0000

Hi Seyed,

As you probably know, IOPS represents total #I/Os / total time and iops represents the per-interval averages. The default averaging interval for IOPs is 500ms (--iopsavgtime=500).

fio is reporting 12,469 interval samples for your run. I would expect 400 (jobs) x 31.279 (actual runtime secs) x 2 (samples per second, ie 1000/500) = 25,023 interval samples. It's not unusual for total # samples to depart vs expected under heavy loads. Your reported IOPS is 2x your iops (199K vs 397k for reads), implying the #samples discrepancy is related to the IOPs discrepancy.

I was able to reproduce your IOPs vs iops discrepancy on a ramdisk using 100 vs 1,000 jobs. Read IOPs for 5 second run:

100 jobs: #samples=910 vs #expected=1,000, IOPS=84.3k vs iops=83.4k
1000 jobs: #samples=1,470 vs #expected=10,000, IOPS=107k vs iops=172k

If I enable IOPs logging with a 500ms interval (--write_iops_log and --log_avg_msec=500) I get the following samples for one of the jobs:

100 job run
563, 717, 0, 0, 0
1062, 759, 0, 0, 0
1562, 784, 0, 0, 0
2062, 772, 0, 0, 0
2562, 774, 0, 0, 0
3062, 769, 0, 0, 0
3563, 778, 0, 0, 0
4064, 706, 0, 0, 0
4563, 757, 0, 0, 0

1000 job run:
892, 128, 0, 0, 0
2660, 95, 0, 0, 0

First column is elapsed time ms, second is average IOPs over that interval. Notice the 1,000 job run has fewer samples. Some jobs in the 1,000 job run have zero samples. The IOPS vs iops discrepancy is likely sampling error from insufficient samples captured due to CPU load. I'm not too familiar with the back-end stats capture but I think it runs as a single thread. Perhaps its priority could be boosted.

Adam

From: seyed mohammad <seymad.f1@xxxxxxxxx>
Sent: Sunday, February 19, 2023 12:42 AM
To: fio@xxxxxxxxxxxxxxx <fio@xxxxxxxxxxxxxxx>
Subject: Problems with interpreting FIO output 

I've run a random read/write FIO test with 400 jobs to benchmark a
RAID setup. The output IOPS and the iops statistics (the average/avg)
are divergent; how can this be interpreted? Should these two values
converge?

Test results for read:
job1: (groupid=0, jobs=400): err= 0: pid=25204:
  read: IOPS=199k, BW=778MiB/s (816MB/s)(23.8GiB/31279msec)
    slat (usec): min=3, max=1401.7k, avg=1423.58, stdev=38712.03
    clat (usec): min=60, max=2683.7k, avg=86310.45, stdev=289174.76
     lat (usec): min=70, max=2683.7k, avg=87730.40, stdev=291369.20
    clat percentiles (usec):
     |  1.00th=[    239],  5.00th=[    562], 10.00th=[    717],
     | 20.00th=[    816], 30.00th=[    865], 40.00th=[    906],
     | 50.00th=[    947], 60.00th=[    988], 70.00th=[   1057],
     | 80.00th=[   1156], 90.00th=[   2147], 95.00th=[ 935330],
     | 99.00th=[1266680], 99.50th=[1300235], 99.90th=[1350566],
     | 99.95th=[1384121], 99.99th=[1803551]
   bw (  MiB/s): min=  228, max= 3556, per=100.00%, avg=1553.73,
stdev= 2.08, samples=12469
   iops        : min=58376, max=910489, avg=397711.45, stdev=532.63,
samples=12469

Test results for write:
 write: IOPS=85.1k, BW=334MiB/s (350MB/s)(10.2GiB/31279msec); 0 zone resets
    slat (usec): min=7, max=1397.5k, avg=1430.09, stdev=38675.97
    clat (usec): min=14, max=2684.1k, avg=97398.93, stdev=306662.71
     lat (usec): min=28, max=2684.1k, avg=98825.72, stdev=308705.23
    clat percentiles (usec):
     |  1.00th=[    314],  5.00th=[    635], 10.00th=[    717],
     | 20.00th=[    791], 30.00th=[    840], 40.00th=[    881],
     | 50.00th=[    922], 60.00th=[    971], 70.00th=[   1057],
     | 80.00th=[   1270], 90.00th=[  10290], 95.00th=[1044382],
     | 99.00th=[1266680], 99.50th=[1300235], 99.90th=[1350566],
     | 99.95th=[1384121], 99.99th=[1769997]
   bw (  KiB/s): min=98765, max=1568174, per=100.00%, avg=681623.43,
stdev=924.60, samples=12469
   iops        : min=24675, max=391998, avg=170363.71, stdev=231.14,
samples=12469

The FIO job:
[global]
name=4k random read 4 ios
ioengine=libaio
direct=1
readwrite=randrw
rwmixread=70
iodepth=64
buffered=0
size=400M
runtime=30
time_based
randrepeat=0
norandommap
refill_buffers
ramp_time=10
bs=4k
numjobs=400
group_reporting=1