Re: help understanding the output of fio

Felix Rubio <felix@xxxxxxxxx> · Fri, 05 Apr 2024 07:41:15 +0200

Hi Jeff,

I have done the test you suggested, and this is what came up:

read_throughput_rpool: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
read_throughput_rpool: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=904MiB/s][r=903 IOPS][eta 00m:00s]
read_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=3284411: Fri Apr 
 5 07:31:18 2024
  read: IOPS=1092, BW=1092MiB/s (1145MB/s)(64.0GiB/60001msec)
    clat (usec): min=733, max=6031, avg=903.71, stdev=166.68
     lat (usec): min=734, max=6032, avg=904.88, stdev=166.84
    clat percentiles (usec):
     |  1.00th=[  742],  5.00th=[  750], 10.00th=[  750], 20.00th=[  
758],
     | 30.00th=[  766], 40.00th=[  791], 50.00th=[  824], 60.00th=[  
873],
     | 70.00th=[ 1037], 80.00th=[ 1106], 90.00th=[ 1139], 95.00th=[ 
1188],
     | 99.00th=[ 1254], 99.50th=[ 1287], 99.90th=[ 1401], 99.95th=[ 
1500],
     | 99.99th=[ 2474]
   bw (  MiB/s): min=  871, max= 1270, per=100.00%, avg=1092.97, 
stdev=166.88, samples=120
   iops        : min=  871, max= 1270, avg=1092.78, stdev=166.95, 
samples=120
  lat (usec)   : 750=6.17%, 1000=60.30%
  lat (msec)   : 2=33.51%, 4=0.02%, 10=0.01%
  cpu          : usr=2.04%, sys=97.52%, ctx=2457, majf=0, minf=37
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     issued rwts: total=65543,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=1092MiB/s (1145MB/s), 1092MiB/s-1092MiB/s 
(1145MB/s-1145MB/s), io=64.0GiB (68.7GB), run=60001-60001msec

I take that the file was being cached (because the 1145 MBps on a single 
SSD disk is hard to believe)? For the write test, these are the results:

write_throughput_rpool: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
write_throughput_rpool: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=4100KiB/s][w=4 IOPS][eta 00m:00s]
write_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=3292023: Fri 
Apr  5 07:36:11 2024
  write: IOPS=5, BW=5394KiB/s (5524kB/s)(317MiB/60174msec); 0 zone 
resets
    clat (msec): min=16, max=1063, avg=189.59, stdev=137.19
     lat (msec): min=16, max=1063, avg=189.81, stdev=137.19
    clat percentiles (msec):
     |  1.00th=[   18],  5.00th=[   22], 10.00th=[   41], 20.00th=[   
62],
     | 30.00th=[  129], 40.00th=[  144], 50.00th=[  163], 60.00th=[  
190],
     | 70.00th=[  228], 80.00th=[  268], 90.00th=[  363], 95.00th=[  
489],
     | 99.00th=[  567], 99.50th=[  592], 99.90th=[ 1062], 99.95th=[ 
1062],
     | 99.99th=[ 1062]
   bw (  KiB/s): min= 2048, max=41042, per=100.00%, avg=5534.14, 
stdev=5269.61, samples=117
   iops        : min=    2, max=   40, avg= 5.40, stdev= 5.14, 
samples=117
  lat (msec)   : 20=3.47%, 50=11.99%, 100=8.83%, 250=51.42%, 500=20.19%
  lat (msec)   : 750=3.79%, 2000=0.32%
  cpu          : usr=0.11%, sys=0.52%, ctx=2673, majf=0, minf=38
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     issued rwts: total=0,317,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=5394KiB/s (5524kB/s), 5394KiB/s-5394KiB/s 
(5524kB/s-5524kB/s), io=317MiB (332MB), run=60174-60174msec

Still at 6 MBps. Should move this discussion to the openzfs team, or is 
there anything else still worth testing?

Thank you very much!

---
Felix Rubio
"Don't believe what you're told. Double check."

On 2024-04-04 21:07, Jeff Johnson wrote:
Felix,

Based on your previous emails about the drive it would appear that the
hardware (ssd, cables, port) are fine and the drive performs.

Go back and run your original ZFS test on your mounted ZFS volume
directory and remove the "--direct=1" from your command as ZFS does
not yet support direct_io and disabling buffered io to the zfs
directory will have very negative impacts. This is a ZFS thing, not
your kernel or hardware.

--Jeff

On Thu, Apr 4, 2024 at 12:00 PM Felix Rubio <felix@xxxxxxxxx> wrote:

hey Jeff,

Good catch! I have run the following command:

fio --name=seqread --numjobs=1 --time_based --runtime=60s 
--ramp_time=2s
--iodepth=8 --ioengine=libaio --direct=1 --verify=0 
--group_reporting=1
--bs=1M --rw=read --size=1G --filename=/dev/sda

(/sda and /sdd are the drives I have on one of the pools), and this is
what I get:

seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB,
(T) 1024KiB-1024KiB, ioengine=libaio, iodepth=8
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=388MiB/s][r=388 IOPS][eta 00m:00s]
seqread: (groupid=0, jobs=1): err= 0: pid=2368687: Thu Apr  4 20:56:06
2024
   read: IOPS=382, BW=383MiB/s (401MB/s)(22.4GiB/60020msec)
     slat (usec): min=17, max=3098, avg=68.94, stdev=46.04
     clat (msec): min=14, max=367, avg=20.84, stdev= 6.61
      lat (msec): min=15, max=367, avg=20.91, stdev= 6.61
     clat percentiles (msec):
      |  1.00th=[   21],  5.00th=[   21], 10.00th=[   21], 20.00th=[
21],
      | 30.00th=[   21], 40.00th=[   21], 50.00th=[   21], 60.00th=[
21],
      | 70.00th=[   21], 80.00th=[   21], 90.00th=[   21], 95.00th=[
21],
      | 99.00th=[   25], 99.50th=[   31], 99.90th=[   48], 99.95th=[
50],
      | 99.99th=[  368]
    bw (  KiB/s): min=215040, max=399360, per=100.00%, avg=392047.06,
stdev=19902.89, samples=120
    iops        : min=  210, max=  390, avg=382.55, stdev=19.43,
samples=120
   lat (msec)   : 20=0.19%, 50=99.80%, 100=0.01%, 500=0.03%
   cpu          : usr=0.39%, sys=1.93%, ctx=45947, majf=0, minf=37
   IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%,
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%,
 >=64=0.0%
      issued rwts: total=22954,0,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
    READ: bw=383MiB/s (401MB/s), 383MiB/s-383MiB/s (401MB/s-401MB/s),
io=22.4GiB (24.1GB), run=60020-60020msec

Disk stats (read/write):
   sda: ios=23817/315, merge=0/0, ticks=549704/132687, 
in_queue=683613,
util=99.93%

400 MBps!!! This is a number I have never experienced. I understand 
this
means I need to go back to the openzfs chat/forum?

---
Felix Rubio
"Don't believe what you're told. Double check."

--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson@xxxxxxxxxxxxxxxxx
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage