Hi Jeff,
I have done the test you suggested, and this is what came up:
read_throughput_rpool: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W)
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
read_throughput_rpool: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=904MiB/s][r=903 IOPS][eta 00m:00s]
read_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=3284411: Fri Apr
5 07:31:18 2024
read: IOPS=1092, BW=1092MiB/s (1145MB/s)(64.0GiB/60001msec)
clat (usec): min=733, max=6031, avg=903.71, stdev=166.68
lat (usec): min=734, max=6032, avg=904.88, stdev=166.84
clat percentiles (usec):
| 1.00th=[ 742], 5.00th=[ 750], 10.00th=[ 750], 20.00th=[
758],
| 30.00th=[ 766], 40.00th=[ 791], 50.00th=[ 824], 60.00th=[
873],
| 70.00th=[ 1037], 80.00th=[ 1106], 90.00th=[ 1139], 95.00th=[
1188],
| 99.00th=[ 1254], 99.50th=[ 1287], 99.90th=[ 1401], 99.95th=[
1500],
| 99.99th=[ 2474]
bw ( MiB/s): min= 871, max= 1270, per=100.00%, avg=1092.97,
stdev=166.88, samples=120
iops : min= 871, max= 1270, avg=1092.78, stdev=166.95,
samples=120
lat (usec) : 750=6.17%, 1000=60.30%
lat (msec) : 2=33.51%, 4=0.02%, 10=0.01%
cpu : usr=2.04%, sys=97.52%, ctx=2457, majf=0, minf=37
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued rwts: total=65543,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=1092MiB/s (1145MB/s), 1092MiB/s-1092MiB/s
(1145MB/s-1145MB/s), io=64.0GiB (68.7GB), run=60001-60001msec
I take that the file was being cached (because the 1145 MBps on a single
SSD disk is hard to believe)? For the write test, these are the results:
write_throughput_rpool: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W)
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
write_throughput_rpool: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=4100KiB/s][w=4 IOPS][eta 00m:00s]
write_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=3292023: Fri
Apr 5 07:36:11 2024
write: IOPS=5, BW=5394KiB/s (5524kB/s)(317MiB/60174msec); 0 zone
resets
clat (msec): min=16, max=1063, avg=189.59, stdev=137.19
lat (msec): min=16, max=1063, avg=189.81, stdev=137.19
clat percentiles (msec):
| 1.00th=[ 18], 5.00th=[ 22], 10.00th=[ 41], 20.00th=[
62],
| 30.00th=[ 129], 40.00th=[ 144], 50.00th=[ 163], 60.00th=[
190],
| 70.00th=[ 228], 80.00th=[ 268], 90.00th=[ 363], 95.00th=[
489],
| 99.00th=[ 567], 99.50th=[ 592], 99.90th=[ 1062], 99.95th=[
1062],
| 99.99th=[ 1062]
bw ( KiB/s): min= 2048, max=41042, per=100.00%, avg=5534.14,
stdev=5269.61, samples=117
iops : min= 2, max= 40, avg= 5.40, stdev= 5.14,
samples=117
lat (msec) : 20=3.47%, 50=11.99%, 100=8.83%, 250=51.42%, 500=20.19%
lat (msec) : 750=3.79%, 2000=0.32%
cpu : usr=0.11%, sys=0.52%, ctx=2673, majf=0, minf=38
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued rwts: total=0,317,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=5394KiB/s (5524kB/s), 5394KiB/s-5394KiB/s
(5524kB/s-5524kB/s), io=317MiB (332MB), run=60174-60174msec
Still at 6 MBps. Should move this discussion to the openzfs team, or is
there anything else still worth testing?
Thank you very much!
---
Felix Rubio
"Don't believe what you're told. Double check."
On 2024-04-04 21:07, Jeff Johnson wrote:
Felix,
Based on your previous emails about the drive it would appear that the
hardware (ssd, cables, port) are fine and the drive performs.
Go back and run your original ZFS test on your mounted ZFS volume
directory and remove the "--direct=1" from your command as ZFS does
not yet support direct_io and disabling buffered io to the zfs
directory will have very negative impacts. This is a ZFS thing, not
your kernel or hardware.
--Jeff
On Thu, Apr 4, 2024 at 12:00 PM Felix Rubio <felix@xxxxxxxxx> wrote:
hey Jeff,
Good catch! I have run the following command:
fio --name=seqread --numjobs=1 --time_based --runtime=60s
--ramp_time=2s
--iodepth=8 --ioengine=libaio --direct=1 --verify=0
--group_reporting=1
--bs=1M --rw=read --size=1G --filename=/dev/sda
(/sda and /sdd are the drives I have on one of the pools), and this is
what I get:
seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB,
(T) 1024KiB-1024KiB, ioengine=libaio, iodepth=8
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=388MiB/s][r=388 IOPS][eta 00m:00s]
seqread: (groupid=0, jobs=1): err= 0: pid=2368687: Thu Apr 4 20:56:06
2024
read: IOPS=382, BW=383MiB/s (401MB/s)(22.4GiB/60020msec)
slat (usec): min=17, max=3098, avg=68.94, stdev=46.04
clat (msec): min=14, max=367, avg=20.84, stdev= 6.61
lat (msec): min=15, max=367, avg=20.91, stdev= 6.61
clat percentiles (msec):
| 1.00th=[ 21], 5.00th=[ 21], 10.00th=[ 21], 20.00th=[
21],
| 30.00th=[ 21], 40.00th=[ 21], 50.00th=[ 21], 60.00th=[
21],
| 70.00th=[ 21], 80.00th=[ 21], 90.00th=[ 21], 95.00th=[
21],
| 99.00th=[ 25], 99.50th=[ 31], 99.90th=[ 48], 99.95th=[
50],
| 99.99th=[ 368]
bw ( KiB/s): min=215040, max=399360, per=100.00%, avg=392047.06,
stdev=19902.89, samples=120
iops : min= 210, max= 390, avg=382.55, stdev=19.43,
samples=120
lat (msec) : 20=0.19%, 50=99.80%, 100=0.01%, 500=0.03%
cpu : usr=0.39%, sys=1.93%, ctx=45947, majf=0, minf=37
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued rwts: total=22954,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=8
Run status group 0 (all jobs):
READ: bw=383MiB/s (401MB/s), 383MiB/s-383MiB/s (401MB/s-401MB/s),
io=22.4GiB (24.1GB), run=60020-60020msec
Disk stats (read/write):
sda: ios=23817/315, merge=0/0, ticks=549704/132687,
in_queue=683613,
util=99.93%
400 MBps!!! This is a number I have never experienced. I understand
this
means I need to go back to the openzfs chat/forum?
---
Felix Rubio
"Don't believe what you're told. Double check."
--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing
jeff.johnson@xxxxxxxxxxxxxxxxx
www.aeoncomputing.com
t: 858-412-3810 x1001 f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage