On 24/02/26 07:53AM, Luis Chamberlain wrote: > On Mon, Feb 26, 2024 at 07:27:18AM -0600, John Groves wrote: > > Run status group 0 (all jobs): > > WRITE: bw=29.6GiB/s (31.8GB/s), 29.6GiB/s-29.6GiB/s (31.8GB/s-31.8GB/s), io=44.7GiB (48.0GB), run=1511-1511msec > > > This is run on an xfs file system on a SATA ssd. > > To compare more closer apples to apples, wouldn't it make more sense > to try this with XFS on pmem (with fio -direct=1)? > > Luis Makes sense. Here is the same command line I used with xfs before, but now it's on /dev/pmem0 (the same 128G, but converted from devdax to pmem because xfs requires that. fio -name=ten-256m-per-thread --nrfiles=10 -bs=2M --group_reporting=1 --alloc-size=1048576 --filesize=256MiB --readwrite=write --fallocate=none --numjobs=48 --create_on_open=0 --ioengine=io_uring --direct=1 --directory=/mnt/xfs ten-256m-per-thread: (g=0): rw=write, bs=(R) 2048KiB-2048KiB, (W) 2048KiB-2048KiB, (T) 2048KiB-2048KiB, ioengine=io_uring, iodepth=1 ... fio-3.33 Starting 48 processes ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) ten-256m-per-thread: Laying out IO files (10 files / total 2441MiB) Jobs: 36 (f=360): [W(3),_(1),W(3),_(1),W(1),_(1),W(6),_(1),W(1),_(1),W(1),_(1),W(7),_(1),W(3),_(1),W(2),_(2),W(4),_(1),W(5),_(1)][77.8%][w=15.1GiB/s][w=7750 IOPS][eta 00m:02s] ten-256m-per-thread: (groupid=0, jobs=48): err= 0: pid=8798: Mon Feb 26 15:10:30 2024 write: IOPS=7582, BW=14.8GiB/s (15.9GB/s)(114GiB/7723msec); 0 zone resets slat (usec): min=23, max=7352, avg=131.80, stdev=151.63 clat (usec): min=385, max=22638, avg=5789.74, stdev=3124.93 lat (usec): min=432, max=22724, avg=5921.54, stdev=3133.18 clat percentiles (usec): | 1.00th=[ 799], 5.00th=[ 1467], 10.00th=[ 2073], 20.00th=[ 3097], | 30.00th=[ 3949], 40.00th=[ 4752], 50.00th=[ 5473], 60.00th=[ 6194], | 70.00th=[ 7046], 80.00th=[ 8029], 90.00th=[ 9634], 95.00th=[11338], | 99.00th=[16319], 99.50th=[17957], 99.90th=[20055], 99.95th=[20579], | 99.99th=[21365] bw ( MiB/s): min=10852, max=26980, per=100.00%, avg=15940.43, stdev=88.61, samples=665 iops : min= 5419, max=13477, avg=7963.08, stdev=44.28, samples=665 lat (usec) : 500=0.15%, 750=0.47%, 1000=1.34% lat (msec) : 2=7.40%, 4=21.46%, 10=60.57%, 20=8.50%, 50=0.11% cpu : usr=2.33%, sys=0.32%, ctx=58806, majf=0, minf=36301 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,58560,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=14.8GiB/s (15.9GB/s), 14.8GiB/s-14.8GiB/s (15.9GB/s-15.9GB/s), io=114GiB (123GB), run=7723-7723msec Disk stats (read/write): pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% I only have some educated guesses as to why famfs is faster. Since files are preallocated, they're always contiguous. And famfs is vastly simpler because it isn't aimed at general purpose uses cases (and indeed can't handle them). Regards, John