Hi Adam, Just to give a little background, this happened when I was testing my software RAID10 (mdadm) alongside BTRFS; I was checking the system load with htop and it did not shoot through the roof. Many thanks for the test results. The same experiment with other FS, had much higher system loads. On Sun, Feb 19, 2023 at 9:06 PM Adam Horshack <horshack@xxxxxxxx> wrote: > > Hi Seyed, > > As you probably know, IOPS represents total #I/Os / total time and iops represents the per-interval averages. The default averaging interval for IOPs is 500ms (--iopsavgtime=500). > > fio is reporting 12,469 interval samples for your run. I would expect 400 (jobs) x 31.279 (actual runtime secs) x 2 (samples per second, ie 1000/500) = 25,023 interval samples. It's not unusual for total # samples to depart vs expected under heavy loads. Your reported IOPS is 2x your iops (199K vs 397k for reads), implying the #samples discrepancy is related to the IOPs discrepancy. > > I was able to reproduce your IOPs vs iops discrepancy on a ramdisk using 100 vs 1,000 jobs. Read IOPs for 5 second run: > > 100 jobs: #samples=910 vs #expected=1,000, IOPS=84.3k vs iops=83.4k > 1000 jobs: #samples=1,470 vs #expected=10,000, IOPS=107k vs iops=172k > > If I enable IOPs logging with a 500ms interval (--write_iops_log and --log_avg_msec=500) I get the following samples for one of the jobs: > > 100 job run > 563, 717, 0, 0, 0 > 1062, 759, 0, 0, 0 > 1562, 784, 0, 0, 0 > 2062, 772, 0, 0, 0 > 2562, 774, 0, 0, 0 > 3062, 769, 0, 0, 0 > 3563, 778, 0, 0, 0 > 4064, 706, 0, 0, 0 > 4563, 757, 0, 0, 0 > > 1000 job run: > 892, 128, 0, 0, 0 > 2660, 95, 0, 0, 0 > > First column is elapsed time ms, second is average IOPs over that interval. Notice the 1,000 job run has fewer samples. Some jobs in the 1,000 job run have zero samples. The IOPS vs iops discrepancy is likely sampling error from insufficient samples captured due to CPU load. I'm not too familiar with the back-end stats capture but I think it runs as a single thread. Perhaps its priority could be boosted. > > Adam > > From: seyed mohammad <seymad.f1@xxxxxxxxx> > Sent: Sunday, February 19, 2023 12:42 AM > To: fio@xxxxxxxxxxxxxxx <fio@xxxxxxxxxxxxxxx> > Subject: Problems with interpreting FIO output > > I've run a random read/write FIO test with 400 jobs to benchmark a > RAID setup. The output IOPS and the iops statistics (the average/avg) > are divergent; how can this be interpreted? Should these two values > converge? > > Test results for read: > job1: (groupid=0, jobs=400): err= 0: pid=25204: > read: IOPS=199k, BW=778MiB/s (816MB/s)(23.8GiB/31279msec) > slat (usec): min=3, max=1401.7k, avg=1423.58, stdev=38712.03 > clat (usec): min=60, max=2683.7k, avg=86310.45, stdev=289174.76 > lat (usec): min=70, max=2683.7k, avg=87730.40, stdev=291369.20 > clat percentiles (usec): > | 1.00th=[ 239], 5.00th=[ 562], 10.00th=[ 717], > | 20.00th=[ 816], 30.00th=[ 865], 40.00th=[ 906], > | 50.00th=[ 947], 60.00th=[ 988], 70.00th=[ 1057], > | 80.00th=[ 1156], 90.00th=[ 2147], 95.00th=[ 935330], > | 99.00th=[1266680], 99.50th=[1300235], 99.90th=[1350566], > | 99.95th=[1384121], 99.99th=[1803551] > bw ( MiB/s): min= 228, max= 3556, per=100.00%, avg=1553.73, > stdev= 2.08, samples=12469 > iops : min=58376, max=910489, avg=397711.45, stdev=532.63, > samples=12469 > > Test results for write: > write: IOPS=85.1k, BW=334MiB/s (350MB/s)(10.2GiB/31279msec); 0 zone resets > slat (usec): min=7, max=1397.5k, avg=1430.09, stdev=38675.97 > clat (usec): min=14, max=2684.1k, avg=97398.93, stdev=306662.71 > lat (usec): min=28, max=2684.1k, avg=98825.72, stdev=308705.23 > clat percentiles (usec): > | 1.00th=[ 314], 5.00th=[ 635], 10.00th=[ 717], > | 20.00th=[ 791], 30.00th=[ 840], 40.00th=[ 881], > | 50.00th=[ 922], 60.00th=[ 971], 70.00th=[ 1057], > | 80.00th=[ 1270], 90.00th=[ 10290], 95.00th=[1044382], > | 99.00th=[1266680], 99.50th=[1300235], 99.90th=[1350566], > | 99.95th=[1384121], 99.99th=[1769997] > bw ( KiB/s): min=98765, max=1568174, per=100.00%, avg=681623.43, > stdev=924.60, samples=12469 > iops : min=24675, max=391998, avg=170363.71, stdev=231.14, > samples=12469 > > > The FIO job: > [global] > name=4k random read 4 ios > ioengine=libaio > direct=1 > readwrite=randrw > rwmixread=70 > iodepth=64 > buffered=0 > size=400M > runtime=30 > time_based > randrepeat=0 > norandommap > refill_buffers > ramp_time=10 > bs=4k > numjobs=400 > group_reporting=1