Re: Software RAID checksum performance on 24 disks not even close to kernel reported

pg@xxxxxxxxxxxxxxxxxxxx (Peter Grandi) · Wed, 6 Jun 2012 18:37:21 +0100

[ ... ]

>> A 21+2 drive RAID6 set is (euphemism) brave, and perhaps it
>> matches the (euphemism) strategic insight that only
>> checksumming withing MD could account for 100% CPU time in a
>> single threaded way.

> It is not a guess that md0_raid6 takes up 100% of 1 core. It
> is reported by 'top'.

> But maybe you are right: The 100% that md0_raid6 uses could be
> due to something other than checksumming. But the test clearly
> show that chunk size has a huge impact on the amount of CPU
> time md0_raid6 has to use.

The (euphemism) test(s) much more "clearly show" something else
entirely :-).

For a (euphemism) different approach here is in three lines a
"test" that in its minuscule simplicity (lots of improvements
could be made) illustrates several things in which it is
(euphemism) different from the one reported above:

------------------------------------------------------------------------
  base#  mdadm --create /dev/md0 -c 64 --level=6 --raid-devices=16 /dev/ram{0..15}
  mdadm: array /dev/md0 started.
------------------------------------------------------------------------
  base#  time dd bs=$((14 * 64 * 1024)) of=/dev/zero iflag=direct if=/dev/md0
  255+0 records in
  255+0 records out
  233963520 bytes (234 MB) copied, 0.0453674 seconds, 5.2 GB/s

  real    0m0.047s
  user    0m0.000s
  sys     0m0.047s
------------------------------------------------------------------------
  base#  sysctl vm/drop_caches=1; time dd bs=$((14 * 64 * 1024)) of=/dev/zero if=/dev/md0
vm.drop_caches = 1
  255+0 records in
  255+0 records out
  233963520 bytes (234 MB) copied, 0.285007 seconds, 821 MB/s

  real    0m0.360s
  user    0m0.000s
  sys     0m0.286s
------------------------------------------------------------------------

Note that this is about *reading* and thus there is no
"checksum" calculation involved. It was amusing also to rerun
the above on 'ram0' instead of 'md0' for comparison.

It was also quite depressing to me to try the same for *writing*
and try different 'bs=' values.

Other (euphemism) different tests: I have compared writing to a
RAID0 set of equivalent stripe width (14) and to a RAID5 set of
equivalent stripe width (14+1).

PS: Running any "test" on a RAID set of in-memory block devices
    seems to me to be (euphemism) entertaining rather than
    useful as RAM accesses are not that parallelizable, and this
    breaks a pretty fundamental assumption.

[ ... ]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html