Re: Software RAID checksum performance on 24 disks not even close to kernel reported

pg@xxxxxxxxxxxxxxxxxxxx (Peter Grandi) · Tue, 5 Jun 2012 12:25:54 +0100

[ ... ]

>>>> I tested this by creating 24 devices in RAM, used different
>>>> chunk sizes, and then copied the linux kernel source. Test
>>>> script can be found on [ ... ] By doing it in RAM the
>>>> results are not affected by physical disks or disk
>>>> controller. So the only change is the speed of computing
>>>> checksums. This can also be seen as the time the process
>>>> md0_raid0 is running.

>>> When I set the disks up as a 24 disk software RAID6

It does not change much of the conclusions as to the (euphemism)
audacity of your conclusions), but you have created a 21+2 RAID6
set, as the 24th block device is a spare:

  seq 24 | parallel -X --tty mdadm --create --force /dev/md0 -c $CHUNK --level=6 --raid-devices=23 -x 1 /dev/loop{}

>>> I get 400 MB/s write and 600 MB/s read. It seems to be due
>>> to checksuming, as I have a single process (md0_raid6)
>>> taking up 100% of one CPU.

[ ... ]

> The 900 MB/s was based on my old controller. I re-measured
> using my new controller and get closer to 2000 MB/s in raw
> (non-RAID) performance, which is close to the theoretical
> maximum for that controller (2400 MB/s). This indicated that
> hardware is not a bottleneck.

A 21+2 drive RAID6 set is (euphemism) brave, and perhaps it
matches the (euphemism) strategic insight that only checksumming
withing MD could account for 100% CPU time in a single threaded
way.

But as a start you could try running your (euphemism) "test"
with O_DIRECT:

  http://www.sabi.co.uk/blog/0709sep.html#070919

While making sure that the IO is stripe aligned (21 times the
chunk size).

Your (euphemism) tests could also probably benefit from more
care about (euphemism) details like commit semantics, as the use
of 'sync' in your scripts seems to me based on (euphemism)
unconventional insight, for example this:

 «seq 10 | time parallel mkdir -p /mnt/md0/{}\;tar -x -C /mnt/md0/{} -f linux.tar\; sync»

But also more divertingly:

 «seq 24 | parallel dd if=/dev/zero of=tmpfs/disk{} bs=500k count=1k
  seq 24 | parallel losetup /dev/loop{} tmpfs/disk{}
  sync
  sleep 1;
  sync»

and even:

 «mount /dev/md0 /mnt/md0
  sync»

Perhaps you might also want to investigate the behaviour of
'tmpfs' and 'loop' devices, as it seems quite (euphemism)
creative to me to have RAID set member block devices as 'loop's
over 'tmpfs' files:

 «mount -t tmpfs tmpfs tmpfs
  seq 24 | parallel dd if=/dev/zero of=tmpfs/disk{} bs=500k count=1k
  seq 24 | parallel losetup /dev/loop{} tmpfs/disk{}»

Put another way, most aspects of your (euphemism) tests seem to
me rather (euphemism) imaginative.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html