Good day! On Mon, Sep 4, 2023 at 2:16 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2023/09/02 14:56, CoolCold 写道: > > Good day! > > 2nd part of the question, in relation of hardware/system from previous > > thread - "raid10, far layout initial sync slow + XFS question" > > https://www.spinics.net/lists/raid/msg74907.html - Ubuntu 20.04 with > > kernel "5.4.0-153-generic #170-Ubuntu" on Hetzner AX161 / AMD EPYC > > 7502P 32-Core Processor > > > > Gist: issuing the same load on RAID10 4 drives N2 16kb chunk is slower > > than running the same load on a single member of that RAID > > Question: is such kind of behavior normal and expected? Am I doing > > something terribly wrong? > > Write will be slower is normal, because each write to the array must > write to all the rdev and wait for these write be be done. This contradicts with common wisdom and basically eliminates one of the points of having striped setups - having N stripes, excepted to give up to N/2 improvement in iops. Say, 3Ware "hardware" RAID has public benchmarks - https://www.broadcom.com/support/knowledgebase/1211161476065/what-kind-of-results-can-i-expect-to-see-under-windows-with-3war , test: 2K Random Writes (IOs/sec)(256 outstanding I/Os) showing single drive ( 203.0 iops ) vs RAID10 4 drives ( 299.8 iops ) , which is roughly 1.5 times better, no WORSE as we see it with mdadm I've done slightly different test, with fio numjobs=4 , result it 20k (single job) vs 35k iops, which is just on par with single drive performance. > > On the other hand, read should be faster, because raid10 only need to > choose one rdev to read. > > Thanks, > Kuai > > > > > RAID10: 18.5k iops > > SINGLE DRIVE: 26k iops > > > > raw data: > > > > RAID config > > root@node2:/data# cat /proc/mdstat > > Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] > > [raid4] [raid10] > > md3 : active raid10 nvme5n1[3] nvme3n1[2] nvme4n1[1] nvme0n1[0] > > 7501212320 blocks super 1.2 16K chunks 2 near-copies [4/4] [UUUU] > > > > Single drive with: > > root@node2:/data# mdadm /dev/md3 --fail /dev/nvme5n1 --remove /dev/nvme5n1 > > mdadm: set /dev/nvme5n1 faulty in /dev/md3 > > mdadm: hot removed /dev/nvme5n1 from /dev/md3 > > > > mdadm --zero-superblock /dev/nvme5n1 > > > > TEST COMMANDS > > RADI10: fio --rw=write --ioengine=sync --fdatasync=1 > > --filename=/dev/md3 --size=8200m --bs=16k --name=mytest > > SINGLE DRIVE: fio --rw=write --ioengine=sync --fdatasync=1 > > --filename=/dev/nvme5n1 --size=8200m --bs=16k --name=mytest > > > > FIO output: > > > > RAID10: > > root@node2:/mnt# fio --rw=write --ioengine=sync --fdatasync=1 > > --filename=/dev/md3 --size=8200m --bs=16k --name=mytest > > mytest: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, > > (T) 16.0KiB-16.0KiB, ioengine=sync, iodepth=1 > > fio-3.16 > > Starting 1 process > > Jobs: 1 (f=1): [W(1)][100.0%][w=298MiB/s][w=19.0k IOPS][eta 00m:00s] > > mytest: (groupid=0, jobs=1): err= 0: pid=2130392: Sat Sep 2 08:21:39 2023 > > write: IOPS=18.5k, BW=290MiB/s (304MB/s)(8200MiB/28321msec); 0 zone resets > > clat (usec): min=5, max=745, avg=12.12, stdev= 7.30 > > lat (usec): min=6, max=746, avg=12.47, stdev= 7.34 > > clat percentiles (usec): > > | 1.00th=[ 8], 5.00th=[ 9], 10.00th=[ 10], 20.00th=[ 10], > > | 30.00th=[ 10], 40.00th=[ 11], 50.00th=[ 11], 60.00th=[ 11], > > | 70.00th=[ 12], 80.00th=[ 13], 90.00th=[ 16], 95.00th=[ 20], > > | 99.00th=[ 39], 99.50th=[ 55], 99.90th=[ 100], 99.95th=[ 116], > > | 99.99th=[ 147] > > bw ( KiB/s): min=276160, max=308672, per=99.96%, avg=296354.86, > > stdev=6624.06, samples=56 > > iops : min=17260, max=19292, avg=18522.18, stdev=414.00, samples=56 > > > > Run status group 0 (all jobs): > > WRITE: bw=290MiB/s (304MB/s), 290MiB/s-290MiB/s (304MB/s-304MB/s), > > io=8200MiB (8598MB), run=28321-28321msec > > > > > > Disk stats (read/write): > > > > md3: ios=0/2604727, merge=0/0, > > ticks=0/0, in_queue=0, util=0.00%, aggrios=25/262403, > > aggrmerge=0/787199, aggrticks=1/5563, aggrin_queue=0, aggrutil=98.10% > > nvme0n1: ios=40/262402, merge=1/787200, ticks=3/5092, in_queue=0, util=98.09% > > nvme3n1: ios=33/262404, merge=1/787198, ticks=2/5050, in_queue=0, util=98.08% > > nvme5n1: ios=15/262404, merge=0/787198, ticks=1/6061, in_queue=0, util=98.08% > > nvme4n1: ios=12/262402, merge=0/787200, ticks=1/6052, in_queue=0, util=98.10% > > > > > > SINGLE DRIVE: > > root@node2:/mnt# fio --rw=write --ioengine=sync --fdatasync=1 > > --filename=/dev/nvme5n1 --size=8200m --bs=16k --name=mytest > > mytest: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, > > (T) 16.0KiB-16.0KiB, ioengine=sync, iodepth=1 > > fio-3.16 > > Starting 1 process > > Jobs: 1 (f=1): [W(1)][100.0%][w=414MiB/s][w=26.5k IOPS][eta 00m:00s] > > mytest: (groupid=0, jobs=1): err= 0: pid=2155313: Sat Sep 2 08:26:23 2023 > > write: IOPS=26.2k, BW=410MiB/s (430MB/s)(8200MiB/20000msec); 0 zone resets > > clat (usec): min=4, max=848, avg=11.25, stdev= 7.15 > > lat (usec): min=5, max=848, avg=11.50, stdev= 7.17 > > clat percentiles (usec): > > | 1.00th=[ 7], 5.00th=[ 9], 10.00th=[ 9], 20.00th=[ 9], > > | 30.00th=[ 10], 40.00th=[ 10], 50.00th=[ 10], 60.00th=[ 11], > > | 70.00th=[ 11], 80.00th=[ 12], 90.00th=[ 15], 95.00th=[ 18], > > | 99.00th=[ 43], 99.50th=[ 62], 99.90th=[ 95], 99.95th=[ 108], > > | 99.99th=[ 133] > > bw ( KiB/s): min=395040, max=464480, per=99.90%, avg=419438.95, > > stdev=17496.05, samples=39 > > iops : min=24690, max=29030, avg=26214.92, stdev=1093.56, samples=39 > > > > Run status group 0 (all jobs): > > WRITE: bw=423MiB/s (444MB/s), 423MiB/s-423MiB/s (444MB/s-444MB/s), > > io=8200MiB (8598MB), run=19379-19379msec > > > > Disk stats (read/write): > > nvme5n1: ios=49/518250, merge=0/1554753, ticks=2/10629, in_queue=0, > > util=99.61% > > > -- Best regards, [COOLCOLD-RIPN]