Richard Scobie wrote:
Mark Knecht wrote:
Once all of that is in place then possibly more cores will help, but I
suspect even then it probably hard to use 4 billion CPU cycles/second
doing nothing but disk I/O. SATA controllers are all doing DMA so CPU
overhead is relatively *very* low.
There is the RAID5/6 parity calculations to be considered on writes
and this appears to be single threaded. There is an experimental
multicore kernel option I believe, but recent discussion indicates
there may be some problems with it.
A very quick test on a box here on a Xeon E5440 (4 x 2.8GHz) and a SAS
attached 16 x 750GB SATA md RAID6. The array is 72% full and probably
quite fragmented and currently the system is idle.
dd if=/dev/zero of=/mnt/storage/dump bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 87.2374 s, 240 MB/s
Looking at the outputs of vmstat 5 and mpstat -P ALL 5 during this,
one core (probably doing parity generation) was around 7.56% idle and
the other 3 were around 88.5, 67.5 and 51.8% idle.
The same test run when the system was commissioned and the array was
empty, acheived 565MB/s writes.
I was able to achieve about 430MB/sec on a 24 disks raid-6 with dd on an
XFS filesystem which was 70% full. I don't think it made great
difference even if it was empty. It was a 54xx Xeon CPU.
I spent some time trying to optimize it but that was the best I could
get. Anyway both my benchmark and Richard's one imply a very significant
bottleneck somehwere.
16 SATA disks have aggregated I/O streaming performance of about
1.4GB/sec so getting 500MB/sec it's 3 times slower.
Raid-0 does not have this problem: there is an old post of Mark Delfman
on this ML in which he was able to obtain about 1.7GB/sec with 10 SAS
disks (15Krpm) in RAID-0, which is much higher than 500MB/s and it's
about the bare disk speed.
I always thought the reason of the slower raid 5/6 was the parity
computation but now that Nicolae has pointed out that the parity
computation speed is so high, the reason must be elsewhere.
Could that be RAM I/O? Raid 5/6 copies data, then probably reads it
again for the parity computation and then writes the parity out... the
CPU cache is too small to hold a stripe for large arrays so it's at
least 3 RAM accesses but yet it should be way faster than this imho.
MRK
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html