Re: raid5 that used parity for reads only when degraded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Neil - Thank you very much for the response.  

In my tests with identically configured raid0 and raid5 arrays, raid5
initially had much lower throughput during reads.  I had assumed that
was because raid5 did parity-checking all the time.  It turns out that
raid5 throughput can get fairly close to raid0 throughput
if /sys/block/md0/md/stripe_cache_size is set to a very high value,
8192-16384.  However the cpu load is still very much higher during raid5
reads.  I'm not sure why?

My test setup consists of 8x WD4000RE 400GB SATA disks, a 2.4GHz
Athlon64X2 cpu and 2GB RAM, kernel 2.6.15 and mdadm 2.3.  I am using my
own simple test application which uses POSIX aio to do randomly
positioned block reads.  When doing 8mb block reads, 14 outstanding io
requests, from a 7-disk raid0 with 1mb chunk size I get 200MB/s
throughput and ~5% cpu load.  When running the same on an 8-disk raid5
with the same chunk size (which I'd expect to have identical
performance, as per what you describe as the behaviour of a non-degraded
raid5) with default stripe_cache_size of 256 I get a mere 60MB/s and a
cpu load of ~12%.  Increasing the stripe_cache_size to 8192 brings the
throughput to approximately 200MB/s or the same as for the raid0, but
the cpu load jumps to 45%.  Some other combinations of parameters, e.g.
32MB chunk size and 4MB reads with stripe_cache_size of 16384 result in
even more pathological cpu loads, over 80% (that is: 80% of both cpus!)
with throughput still at approx 200MB/s.  As a point of comparison the
same application reading directly from the raw disk devices with the
same settings achieves a total throughput of 300MB/s and a cpu load of
3%, so I am pretty sure the SATA controllers or drivers etc are not a
factor.  Also the cpu load is measured with Andrew Morton's cyclesoak
tool which I believe to be quite accurate.

Any thoughts on what could be causing the high cpu load?  I am very
interested in helping debug this since I really need a high-throughput
raid5 with reasonably low cpu requirements.  Please let me know if you
have any ideas or anything you'd like me to try (valgrind, perhaps?).
I'd be happy to give you more details on the test setup as well.

Sincerely,

--Alex

On Thu, 2006-03-23 at 11:13 +1100, Neil Brown wrote:
> On Wednesday March 22, aizvorski@xxxxxxxxx wrote:
> > Hello,
> > 
> > I have a question: I'd like to have a raid5 array which writes parity data but
> > does not check it during reads while the array is ok.  I would trust each disk
> > to detect errors itself and cause the array to be degraded if necessary, in
> > which case that disk would drop out and the parity data would start being used
> > just as in a normal raid5.  In other words until there is an I/O error that
> > causes a disk to drop out, such an array would behave almost like a raid0 with
> > N-1 disks as far as reads are concerned.  Ideally this behavior would be
> > something that one could turn on/off on the fly with a ioctl or via a echo "0" >
> > /sys/block/md0/check_parity_on_reads type of mechanism.  
> > 
> > How hard is this to do?   Is anyone interested in helping to do this?  I think
> > it would really help applications which have a lot more reads than writes. 
> > Where exactly does parity checking during reads happen?  I've looked over the
> > code briefly but the right part of it didn't appear obvious ;)
> 
> Parity checking does not happen during read.  You already have what
> you want.
> 
> NeilBrown





-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux