Re: performance of raid5 on fast devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jake et al,

I took the oportunity to measure raid5 on a 4x NVME here with
variations of group_thread_cnt={0..10} minimal
stripe_cache_size={256,512,1024,2048,4096,8192,16384,32768}

This is on an X-99 with Intel E5-2640 and kernel 4.9.3-200.fc25.x86_64.

Highest active stripe count logged < 17K.


fio job/sections used:
----------------------------
[r-md0]
ioengine=libaio
iodepth=40
rw=read
bs=4096K
direct=1
size=4G
numjobs=8
filename=/dev/md0

[w-md0]
ioengine=libaio
iodepth=40
rw=write
bs=4096K
direct=1
size=4G
numjobs=8
filename=/dev/md0


Baseline performance seen with raid0:
---------------------------------------------------
md0 : active raid0 dm-350[3] dm-349[2] dm-348[1] dm-347[0]
      33521664 blocks super 1.2 32k chunks

READ: io=32768MB, aggrb=8202.3MB/s, minb=1025.3MB/s, maxb=1217.7MB/s, mint=3364msec, maxt=3995msec WRITE: io=32768MB, aggrb=5746.8MB/s, minb=735584KB/s, maxb=836685KB/s, mint=5013msec, maxt=5702msec


Performance with raid5:
--------------------------------
md0 : active raid5 dm-350[3] dm-349[2] dm-348[1] dm-347[0]
25141248 blocks super 1.2 level 5, 32k chunk, algorithm 2 [4/4] [UUUU]


READ: io=32768MB, aggrb=7375.3MB/s, minb=944025KB/s, maxb=1001.1MB/s, mint=4088msec, maxt=4443msec


Write results for group_thread_cnt/stripe_cache_size variations:
------------------------------------------------------------------------------------
0/256 -> WRITE: io=32768MB, aggrb=1296.4MB/s, minb=165927KB/s, maxb=167644KB/s, mint=25019msec, maxt=25278msec 1/256 -> WRITE: io=32768MB, aggrb=2152.6MB/s, minb=275524KB/s, maxb=278654KB/s, mint=15052msec, maxt=15223msec 2/256 -> WRITE: io=32768MB, aggrb=3177.4MB/s, minb=406700KB/s, maxb=415854KB/s, mint=10086msec, maxt=10313msec 3/256 -> WRITE: io=32768MB, aggrb=4026.6MB/s, minb=515397KB/s, maxb=524222KB/s, mint=8001msec, maxt=8138msec 4/256 -> WRITE: io=32768MB, aggrb=4172.2MB/s, minb=534034KB/s, maxb=552609KB/s, mint=7590msec, maxt=7854msec * 5/256 -> WRITE: io=32768MB, aggrb=4166.9MB/s, minb=533355KB/s, maxb=547845KB/s, mint=7656msec, maxt=7864msec 6/256 -> WRITE: io=32768MB, aggrb=4189.3MB/s, minb=536218KB/s, maxb=556126KB/s, mint=7542msec, maxt=7822msec 7/256 -> WRITE: io=32768MB, aggrb=4192.5MB/s, minb=536630KB/s, maxb=560810KB/s, mint=7479msec, maxt=7816msec 8/256 -> WRITE: io=32768MB, aggrb=4185.2MB/s, minb=535807KB/s, maxb=562389KB/s, mint=7458msec, maxt=7828msec 9/256 -> WRITE: io=32768MB, aggrb=4192.1MB/s, minb=536699KB/s, maxb=577966KB/s, mint=7257msec, maxt=7815msec 10/256 -> WRITE: io=32768MB, aggrb=4182.3MB/s, minb=535329KB/s, maxb=568256KB/s, mint=7381msec, maxt=7835msec

0/512 -> WRITE: io=32768MB, aggrb=1297.8MB/s, minb=166025KB/s, maxb=167664KB/s, mint=25016msec, maxt=25263msec 1/512 -> WRITE: io=32768MB, aggrb=2148.5MB/s, minb=275000KB/s, maxb=278044KB/s, mint=15085msec, maxt=15252msec 2/512 -> WRITE: io=32768MB, aggrb=3158.4MB/s, minb=404270KB/s, maxb=411407KB/s, mint=10195msec, maxt=10375msec 3/512 -> WRITE: io=32768MB, aggrb=4102.7MB/s, minb=525141KB/s, maxb=539738KB/s, mint=7771msec, maxt=7987msec 4/512 -> WRITE: io=32768MB, aggrb=4162.8MB/s, minb=532745KB/s, maxb=541759KB/s, mint=7742msec, maxt=7873msec * 5/512 -> WRITE: io=32768MB, aggrb=4178.6MB/s, minb=534851KB/s, maxb=549856KB/s, mint=7628msec, maxt=7842msec 6/512 -> WRITE: io=32768MB, aggrb=4167.4MB/s, minb=533422KB/s, maxb=562314KB/s, mint=7459msec, maxt=7863msec 7/512 -> WRITE: io=32768MB, aggrb=4192.1MB/s, minb=536699KB/s, maxb=566338KB/s, mint=7406msec, maxt=7815msec 8/512 -> WRITE: io=32768MB, aggrb=4189.8MB/s, minb=536287KB/s, maxb=558644KB/s, mint=7508msec, maxt=7821msec 9/512 -> WRITE: io=32768MB, aggrb=4165.8MB/s, minb=533219KB/s, maxb=559837KB/s, mint=7492msec, maxt=7866msec 10/512 -> WRITE: io=32768MB, aggrb=4177.2MB/s, minb=534783KB/s, maxb=570188KB/s, mint=7356msec, maxt=7843msec

0/1024 -> WRITE: io=32768MB, aggrb=1288.6MB/s, minb=164935KB/s, maxb=166877KB/s, mint=25134msec, maxt=25430msec 1/1024 -> WRITE: io=32768MB, aggrb=2218.5MB/s, minb=283955KB/s, maxb=289842KB/s, mint=14471msec, maxt=14771msec 2/1024 -> WRITE: io=32768MB, aggrb=3186.1MB/s, minb=407926KB/s, maxb=420903KB/s, mint=9965msec, maxt=10282msec 3/1024 -> WRITE: io=32768MB, aggrb=4107.4MB/s, minb=525733KB/s, maxb=538836KB/s, mint=7784msec, maxt=7978msec 4/1024 -> WRITE: io=32768MB, aggrb=4146.9MB/s, minb=530790KB/s, maxb=550505KB/s, mint=7619msec, maxt=7902msec 5/1024 -> WRITE: io=32768MB, aggrb=4160.5MB/s, minb=532542KB/s, maxb=550795KB/s, mint=7615msec, maxt=7876msec * 6/1024 -> WRITE: io=32768MB, aggrb=4174.3MB/s, minb=534306KB/s, maxb=558942KB/s, mint=7504msec, maxt=7850msec 7/1024 -> WRITE: io=32768MB, aggrb=4189.8MB/s, minb=536287KB/s, maxb=556864KB/s, mint=7532msec, maxt=7821msec 8/1024 -> WRITE: io=32768MB, aggrb=4188.2MB/s, minb=536081KB/s, maxb=561035KB/s, mint=7476msec, maxt=7824msec 9/1024 -> WRITE: io=32768MB, aggrb=4167.4MB/s, minb=533422KB/s, maxb=567872KB/s, mint=7386msec, maxt=7863msec 10/1024 -> WRITE: io=32768MB, aggrb=4188.2MB/s, minb=536081KB/s, maxb=569878KB/s, mint=7360msec, maxt=7824msec

0/2048 -> WRITE: io=32768MB, aggrb=1265.7MB/s, minb=162004KB/s, maxb=166111KB/s, mint=25250msec, maxt=25890msec 1/2048 -> WRITE: io=32768MB, aggrb=2239.5MB/s, minb=286652KB/s, maxb=290846KB/s, mint=14421msec, maxt=14632msec 2/2048 -> WRITE: io=32768MB, aggrb=3184.5MB/s, minb=407609KB/s, maxb=413150KB/s, mint=10152msec, maxt=10290msec 3/2048 -> WRITE: io=32768MB, aggrb=4213.5MB/s, minb=539321KB/s, maxb=557901KB/s, mint=7518msec, maxt=7777msec * 4/2048 -> WRITE: io=32768MB, aggrb=4168.5MB/s, minb=533558KB/s, maxb=543162KB/s, mint=7722msec, maxt=7861msec 5/2048 -> WRITE: io=32768MB, aggrb=4185.5MB/s, minb=535739KB/s, maxb=549352KB/s, mint=7635msec, maxt=7829msec 6/2048 -> WRITE: io=32768MB, aggrb=4181.8MB/s, minb=535260KB/s, maxb=553338KB/s, mint=7580msec, maxt=7836msec 7/2048 -> WRITE: io=32768MB, aggrb=4215.7MB/s, minb=539599KB/s, maxb=566109KB/s, mint=7409msec, maxt=7773msec 8/2048 -> WRITE: io=32768MB, aggrb=4200.5MB/s, minb=537662KB/s, maxb=568102KB/s, mint=7383msec, maxt=7801msec 9/2048 -> WRITE: io=32768MB, aggrb=4184.1MB/s, minb=535671KB/s, maxb=574483KB/s, mint=7301msec, maxt=7830msec 10/2048 -> WRITE: io=32768MB, aggrb=4172.7MB/s, minb=534102KB/s, maxb=567641KB/s, mint=7389msec, maxt=7853msec

0/4096 -> WRITE: io=32768MB, aggrb=1264.8MB/s, minb=161879KB/s, maxb=168588KB/s, mint=24879msec, maxt=25910msec 1/4096 -> WRITE: io=32768MB, aggrb=2349.4MB/s, minb=300710KB/s, maxb=312541KB/s, mint=13420msec, maxt=13948msec 2/4096 -> WRITE: io=32768MB, aggrb=3387.6MB/s, minb=433609KB/s, maxb=441877KB/s, mint=9492msec, maxt=9673msec 3/4096 -> WRITE: io=32768MB, aggrb=4182.3MB/s, minb=535329KB/s, maxb=552390KB/s, mint=7593msec, maxt=7835msec * 4/4096 -> WRITE: io=32768MB, aggrb=4170.2MB/s, minb=533762KB/s, maxb=560061KB/s, mint=7489msec, maxt=7858msec 5/4096 -> WRITE: io=32768MB, aggrb=4179.6MB/s, minb=534919KB/s, maxb=548490KB/s, mint=7647msec, maxt=7841msec 6/4096 -> WRITE: io=32768MB, aggrb=4183.4MB/s, minb=535465KB/s, maxb=549208KB/s, mint=7637msec, maxt=7833msec 7/4096 -> WRITE: io=32768MB, aggrb=4174.9MB/s, minb=534374KB/s, maxb=557530KB/s, mint=7523msec, maxt=7849msec 8/4096 -> WRITE: io=32768MB, aggrb=4178.6MB/s, minb=534851KB/s, maxb=570188KB/s, mint=7356msec, maxt=7842msec 9/4096 -> WRITE: io=32768MB, aggrb=4180.2MB/s, minb=535056KB/s, maxb=570110KB/s, mint=7357msec, maxt=7839msec 10/4096 -> WRITE: io=32768MB, aggrb=4183.9MB/s, minb=535534KB/s, maxb=574640KB/s, mint=7299msec, maxt=7832msec

0/8192 -> WRITE: io=32768MB, aggrb=1260.9MB/s, minb=161381KB/s, maxb=171511KB/s, mint=24455msec, maxt=25990msec 1/8192 -> WRITE: io=32768MB, aggrb=2368.5MB/s, minb=303166KB/s, maxb=320444KB/s, mint=13089msec, maxt=13835msec 2/8192 -> WRITE: io=32768MB, aggrb=3408.8MB/s, minb=436225KB/s, maxb=458544KB/s, mint=9147msec, maxt=9615msec 3/8192 -> WRITE: io=32768MB, aggrb=4219.5MB/s, minb=540085KB/s, maxb=564585KB/s, mint=7429msec, maxt=7766msec * 4/8192 -> WRITE: io=32768MB, aggrb=4208.6MB/s, minb=538698KB/s, maxb=570653KB/s, mint=7350msec, maxt=7786msec 5/8192 -> WRITE: io=32768MB, aggrb=4200.5MB/s, minb=537662KB/s, maxb=562013KB/s, mint=7463msec, maxt=7801msec 6/8192 -> WRITE: io=32768MB, aggrb=4189.3MB/s, minb=536218KB/s, maxb=585387KB/s, mint=7165msec, maxt=7822msec 7/8192 -> WRITE: io=32768MB, aggrb=4184.5MB/s, minb=535602KB/s, maxb=579323KB/s, mint=7240msec, maxt=7831msec 8/8192 -> WRITE: io=32768MB, aggrb=4186.6MB/s, minb=535876KB/s, maxb=572132KB/s, mint=7331msec, maxt=7827msec 9/8192 -> WRITE: io=32768MB, aggrb=4176.5MB/s, minb=534578KB/s, maxb=598246KB/s, mint=7011msec, maxt=7846msec 10/8192 -> WRITE: io=32768MB, aggrb=4184.1MB/s, minb=535671KB/s, maxb=580285KB/s, mint=7228msec, maxt=7830msec

0/16384 -> WRITE: io=32768MB, aggrb=1281.0MB/s, minb=163968KB/s, maxb=183542KB/s, mint=22852msec, maxt=25580msec 1/16384 -> WRITE: io=32768MB, aggrb=2451.8MB/s, minb=313827KB/s, maxb=337787KB/s, mint=12417msec, maxt=13365msec 2/16384 -> WRITE: io=32768MB, aggrb=3409.5MB/s, minb=436406KB/s, maxb=468532KB/s, mint=8952msec, maxt=9611msec 3/16384 -> WRITE: io=32768MB, aggrb=4192.5MB/s, minb=536630KB/s, maxb=566721KB/s, mint=7401msec, maxt=7816msec * 4/16384 -> WRITE: io=32768MB, aggrb=4172.2MB/s, minb=534034KB/s, maxb=581089KB/s, mint=7218msec, maxt=7854msec 5/16384 -> WRITE: io=32768MB, aggrb=4175.4MB/s, minb=534442KB/s, maxb=587108KB/s, mint=7144msec, maxt=7848msec 6/16384 -> WRITE: io=32768MB, aggrb=4188.2MB/s, minb=536081KB/s, maxb=585224KB/s, mint=7167msec, maxt=7824msec 7/16384 -> WRITE: io=32768MB, aggrb=4173.8MB/s, minb=534238KB/s, maxb=591330KB/s, mint=7093msec, maxt=7851msec 8/16384 -> WRITE: io=32768MB, aggrb=4163.2MB/s, minb=532880KB/s, maxb=590165KB/s, mint=7107msec, maxt=7871msec 9/16384 -> WRITE: io=32768MB, aggrb=4166.9MB/s, minb=533355KB/s, maxb=608664KB/s, mint=6891msec, maxt=7864msec 10/16384 -> WRITE: io=32768MB, aggrb=4157.9MB/s, minb=532204KB/s, maxb=594768KB/s, mint=7052msec, maxt=7881msec

0/32768 -> WRITE: io=32768MB, aggrb=1288.1MB/s, minb=164980KB/s, maxb=189026KB/s, mint=22189msec, maxt=25423msec 1/32768 -> WRITE: io=32768MB, aggrb=2443.6MB/s, minb=312774KB/s, maxb=348624KB/s, mint=12031msec, maxt=13410msec 2/32768 -> WRITE: io=32768MB, aggrb=3467.1MB/s, minb=443888KB/s, maxb=484722KB/s, mint=8653msec, maxt=9449msec 3/32768 -> WRITE: io=32768MB, aggrb=4131.2MB/s, minb=528782KB/s, maxb=572444KB/s, mint=7327msec, maxt=7932msec * 4/32768 -> WRITE: io=32768MB, aggrb=4082.8MB/s, minb=522589KB/s, maxb=606990KB/s, mint=6910msec, maxt=8026msec 5/32768 -> WRITE: io=32768MB, aggrb=3985.5MB/s, minb=510131KB/s, maxb=578046KB/s, mint=7256msec, maxt=8222msec 6/32768 -> WRITE: io=32768MB, aggrb=3937.2MB/s, minb=504062KB/s, maxb=591914KB/s, mint=7086msec, maxt=8321msec 7/32768 -> WRITE: io=32768MB, aggrb=4012.3MB/s, minb=513567KB/s, maxb=583028KB/s, mint=7194msec, maxt=8167msec 8/32768 -> WRITE: io=32768MB, aggrb=3944.2MB/s, minb=504851KB/s, maxb=567257KB/s, mint=7394msec, maxt=8308msec 9/32768 -> WRITE: io=32768MB, aggrb=3930.1MB/s, minb=503155KB/s, maxb=580687KB/s, mint=7223msec, maxt=8336msec 10/32768 -> WRITE: io=32768MB, aggrb=3965.2MB/s, minb=507539KB/s, maxb=599443KB/s, mint=6997msec, maxt=8264msec


Analysis:
-----------
- the amount of minimum stripe cache entries doesn't cause much variation as expected
- writing threads cause significant performance enhancement
- seen best results with 3 or 4 writing threads which correlates well to the # of stripes


Did you provide your fio job(s) for comparision yet?

Regards,
Heinz

P.S.: write performance tested with the following script:

#!/bin/sh

MD=md0

for s in 256 512 1024 2048 4096 8192 16384 32768
do
        echo $s > /sys/block/$MD/md/stripe_cache_size

        for t in {0..10}
        do
                echo $t > /sys/block/$MD/md/group_thread_cnt
                echo -n "$t/$s -> "
fio --section=w-md0 fio_md0.job 2>&1|grep "aggrb="|sed 's/^ *//'
        done
done



On 01/17/2017 04:28 PM, Jake Yao wrote:
Thanks for the response.

I am using fio for performance measurement.

The chunk size of raid5 array is 32K, and the block size in fio is set
to 96K(3x chunk size) which is also the optimal_io_size, ioengine is
set to libaio with direct IO.

Increasing stripe_cache_size does not help much, and it looks like the
write is limited by the single kernel thread as mentioned earlier.


On Tue, Jan 17, 2017 at 12:10 AM, Roman Mamedov <rm@xxxxxxxxxxx> wrote:
On Mon, 16 Jan 2017 21:35:21 -0500
Jake Yao <jgyao1@xxxxxxxxx> wrote:

I have a raid5 array on 4 NVMe drives, and the performance on the
array is only marginally better than a single drive. Unlike a similar
raid5 array on 4 SAS SSD or HDD,  the performance on array is 3x
better than a single drive, which is expected.

It looks like when the single kernel thread associated with the raid
device running at 100%, the array performance hit its peak. This can
happen easily for fast devices like NVMe.

This can reproduced by creating a raid5 with 4 ramdisks as well, and
comparing performance on the array and one ramdisk. Sometimes the
performance on the array is worse than a single ramdisk.

The kernel version is 4.9.0-rc3 and mdadm is release 3.4, no write
journal is configured.

Is this a known issue?
How do you measure the performance?

Sure it may be CPU-bound in the end, but also why not try the usual
optimization tricks, such as:

   * increase your stripe_cache_size, it's not uncommon that this can speed up
     linear writes by as much as several times;

   * if you meant reads, you could look into read-ahead settings for the array;

   * and in both cases, try experimenting with different stripe sizes (if you
     were using 512K, try with 64K stripes).

--
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux