Hello, On Sat, May 18, 2019 at 12:52:50AM +0200, Reindl Harald wrote: > sdaly RAID10 don't support --write-mostly, otherwise i won't have bought > 4 expensive 2 TB SSD's in the last two years..... Interesting. It certainly looks like you are correct: $ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md4 : active raid1 sdc[1](W) nvme0n1[0] 10485760 blocks super 1.2 [2/2] [UU] […] $ /opt/fio/bin/fio --ioengine=libaio --direct=1 --gtod_reduce=1 --name=limoncello_ro_ mdwritemostly_4jobs --filename=/mnt/fio --bs=32k --iodepth=64 --numjobs=1 --size=8G - -readwrite=randread --group_reporting limoncello_ro_mdwritemostly_4jobs: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32 .0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=64 fio-3.13-42-g8066f Starting 1 process limoncello_ro_mdwritemostly_4jobs: Laying out IO file (1 file / 8192MiB) Jobs: 1 (f=1): [r(1)][100.0%][r=2439MiB/s][r=78.0k IOPS][eta 00m:00s] limoncello_ro_mdwritemostly_4jobs: (groupid=0, jobs=1): err= 0: pid=24347: Fri May 17 22:58:13 2019 read: IOPS=77.0k, BW=2437MiB/s (2556MB/s)(8192MiB/3361msec) bw ( MiB/s): min= 2433, max= 2441, per=100.00%, avg=2438.45, stdev= 3.26, samples =6 iops : min=77876, max=78130, avg=78030.33, stdev=104.35, samples=6 cpu : usr=7.02%, sys=36.13%, ctx=161966, majf=0, minf=519 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=2437MiB/s (2556MB/s), 2437MiB/s-2437MiB/s (2556MB/s-2556MB/s), io=8192MiB (8590MB), run=3361-3361msec Disk stats (read/write): md4: ios=252936/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=131072/0 , aggrmerge=0/0, aggrticks=106599/0, aggrin_queue=106384, aggrutil=95.78% nvme0n1: ios=262144/0, merge=0/0, ticks=213198/0, in_queue=212768, util=95.78% sdc: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% (note no IOs to sdc) But it actually gets faster when IOs are allowed to go to both! $ echo -writemostly | sudo tee /sys/block/md4/md/dev-sdc/state -writemostly $ /opt/fio/bin/fio --ioengine=libaio --direct=1 --gtod_reduce=1 --name=limoncello_ro_ mdwritemostly_4jobs --filename=/mnt/fio --bs=32k --iodepth=64 --numjobs=1 --size=8G - -readwrite=randread --group_reporting limoncello_ro_mdwritemostly_4jobs: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32 .0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=64 fio-3.13-42-g8066f Starting 1 process Jobs: 1 (f=1) limoncello_ro_mdwritemostly_4jobs: (groupid=0, jobs=1): err= 0: pid=24385: Fri May 17 22:59:44 2019 read: IOPS=92.6k, BW=2894MiB/s (3034MB/s)(8192MiB/2831msec) bw ( MiB/s): min= 2888, max= 2904, per=100.00%, avg=2895.91, stdev= 6.50, samples =5 iops : min=92434, max=92940, avg=92669.60, stdev=207.52, samples=5 cpu : usr=9.61%, sys=42.83%, ctx=120747, majf=0, minf=521 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=2894MiB/s (3034MB/s), 2894MiB/s-2894MiB/s (3034MB/s-3034MB/s), io=8192MiB (8590MB), run=2831-2831msec Disk stats (read/write): md4: ios=245417/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=131067/0 , aggrmerge=4/0, aggrticks=89358/0, aggrin_queue=89192, aggrutil=94.99% nvme0n1: ios=215218/0, merge=0/0, ticks=88984/0, in_queue=88700, util=94.99% sdc: ios=46917/0, merge=9/0, ticks=89733/0, in_queue=89684, util=94.99% (92.6k IOPS vs 77k IOPS) It's also interesting that the exact same fio job against a RAID-10 only achieves 36.2k IOPS: $ sudo mdadm --create --verbose --assume-clean /dev/md4 --level=10 --raid-devices=2 --size=10G /dev/nvme0n1 /dev/sdc mdadm: layout defaults to n2 mdadm: layout defaults to n2 mdadm: chunk size defaults to 512K mdadm: largest drive (/dev/nvm0n1) exceeds size (10485760K) by more than 1% Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md4 started. $ sudo mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/md4 mke2fs 1.44.5 (15-Dec-2018) /dev/md4 contains a ext4 file system last mounted on /mnt on Fri May 17 22:55:53 2019 Proceed anyway? (y,N) y Discarding device blocks: done Creating filesystem with 2621440 4k blocks and 655360 inodes Filesystem UUID: 22c2c0d1-494b-4435-8da2-114c868d966c Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632 Allocating group tables: done Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: done $ sudo mount /dev/md4 /mnt $ sudo chown andy: /mnt $ /opt/fio/bin/fio --ioengine=libaio --direct=1 --gtod_reduce=1 --name=limoncello_ro_mdwritemostly_4jobs --filename=/mnt/fio --bs=32k --iodepth=64 --numjobs=1 --size=8G --readwrite=randread --group_reporting limoncello_ro_mdwritemostly_4jobs: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=64 fio-3.13-42-g8066f Starting 1 process limoncello_ro_mdwritemostly_4jobs: Laying out IO file (1 file / 8192MiB) Jobs: 1 (f=1): [r(1)][100.0%][r=1127MiB/s][r=36.1k IOPS][eta 00m:00s] limoncello_ro_mdwritemostly_4jobs: (groupid=0, jobs=1): err= 0: pid=24570: Fri May 17 23:05:59 2019 read: IOPS=36.2k, BW=1133MiB/s (1188MB/s)(8192MiB/7232msec) bw ( MiB/s): min= 1118, max= 1145, per=99.94%, avg=1132.12, stdev= 9.09, samples=14 iops : min=35786, max=36656, avg=36227.71, stdev=290.84, samples=14 cpu : usr=5.46%, sys=26.12%, ctx=189495, majf=0, minf=519 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=1133MiB/s (1188MB/s), 1133MiB/s-1133MiB/s (1188MB/s-1188MB/s), io=8192MiB (8590MB), run=7232-7232msec Disk stats (read/write): md4: ios=260907/3, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=130711/4, aggrmerge=361/1, aggrticks=228765/8, aggrin_queue=126206, aggrutil=98.07% nvme0n1: ios=142323/4, merge=0/1, ticks=21914/0, in_queue=21392, util=96.50% sdc: ios=119099/5, merge=722/1, ticks=435617/17, in_queue=231020, util=98.07% So maybe I should just be using RAID-1 of these and forget about --write-mostly? Should the non-implementation of --write-mostly on RAID-10 be reported as a bug, since mdadm silently accepts it and reports its use in /proc/mdstat? Cheers, Andy