On Tue, Jun 5, 2012 at 1:25 PM, Peter Grandi <pg@xxxxxxxxxxxxxxxxxxxx> wrote: : > It does not change much of the conclusions as to the (euphemism) > audacity of your conclusions), but you have created a 21+2 RAID6 > set, as the 24th block device is a spare: > > seq 24 | parallel -X --tty mdadm --create --force /dev/md0 -c $CHUNK --level=6 --raid-devices=23 -x 1 /dev/loop{} That is correct. It reflects the physical setup of the 24 physical drives. >>>> I get 400 MB/s write and 600 MB/s read. It seems to be due >>>> to checksuming, as I have a single process (md0_raid6) >>>> taking up 100% of one CPU. > > [ ... ] > >> The 900 MB/s was based on my old controller. I re-measured >> using my new controller and get closer to 2000 MB/s in raw >> (non-RAID) performance, which is close to the theoretical >> maximum for that controller (2400 MB/s). This indicated that >> hardware is not a bottleneck. > > A 21+2 drive RAID6 set is (euphemism) brave, and perhaps it > matches the (euphemism) strategic insight that only checksumming > withing MD could account for 100% CPU time in a single threaded > way. It is not a guess that md0_raid6 takes up 100% of 1 core. It is reported by 'top'. But maybe you are right: The 100% that md0_raid6 uses could be due to something other than checksumming. But the test clearly show that chunk size has a huge impact on the amount of CPU time md0_raid6 has to use. > But as a start you could try running your (euphemism) "test" > with O_DIRECT: > > http://www.sabi.co.uk/blog/0709sep.html#070919 > > While making sure that the IO is stripe aligned (21 times the > chunk size). It is unclear to me how to change the timed part of the test script to use O_DIRECT and make it stripe aligned: seq 10 | time parallel mkdir -p /mnt/md0/{}\;tar -x -C /mnt/md0/{} -f linux.tar\; sync seq 10 | time parallel mkdir -p /mnt/md0/{}\;cp linux.tar /mnt/md0/{} \; sync Please advice. > Your (euphemism) tests could also probably benefit from more > care about (euphemism) details like commit semantics, as the use > of 'sync' in your scripts seems to me based on (euphemism) > unconventional insight, for example this: > > «seq 10 | time parallel mkdir -p /mnt/md0/{}\;tar -x -C /mnt/md0/{} -f linux.tar\; sync» Feel free to substitute with: seq 10 | time parallel mkdir -p /mnt/md0/{}\;tar -x -C /mnt/md0/{} -f linux.tar time sync Here you will have to add the two durations. With that modification I get: Chunk size Time to copy 10 linux kernel sources as files Time to copy 10 linux kernel sources as a single tar file 16 29s 13s 32 28s 11s 64 29s 13s 128 34s 10s 256 41s 11s 4096 1m35s 2m15s (!) Most numbers are comparable to the original results http://oletange.blogspot.dk/2012/05/software-raid-performance-on-24-disks.html The 2m15s result for the 4096 big-file test was a bit surprising, so I re-ran that test and got 2m36s. > But also more divertingly: > > «seq 24 | parallel dd if=/dev/zero of=tmpfs/disk{} bs=500k count=1k > seq 24 | parallel losetup /dev/loop{} tmpfs/disk{} > sync > sleep 1; > sync» Are you aware that this part is for the setup of the test? It is not the timed section and thus it does not affect the validity of the test. > and even: > > «mount /dev/md0 /mnt/md0 > sync» Yeah that part was a bit weird, but I had 1 run where the script failed without the 'sync'. And again: Are you aware that this part is for the setup of the test? It is not the timed section and thus does not change the validity of the test. > Perhaps you might also want to investigate the behaviour of > 'tmpfs' and 'loop' devices, as it seems quite (euphemism) > creative to me to have RAID set member block devices as 'loop's > over 'tmpfs' files: > > «mount -t tmpfs tmpfs tmpfs > seq 24 | parallel dd if=/dev/zero of=tmpfs/disk{} bs=500k count=1k > seq 24 | parallel losetup /dev/loop{} tmpfs/disk{}» How would YOU design the test so that: * it is reproducible for others? * it does not depend on controllers and disks? * it uses 24 devices? * it uses different chunk sizes? * it tests both big and small file performance? > Put another way, most aspects of your (euphemism) tests seem to > me rather (euphemism) imaginative. Did you run the test script? What were your numbers? Did md0_raid6 take up 100% CPU of 1 core during the copy? And if so: Can you explain why md0_raid6 would take up 100% CPU of 1 core? /Ole -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html