Re: RAID-5 streaming read performance

Ming Zhang <mingz@xxxxxxxxxxx> · Thu, 14 Jul 2005 08:30:55 -0400



On Wed, 2005-07-13 at 23:58 -0400, Dan Christensen wrote:
> David Greaves <david@xxxxxxxxxxxx> writes:
> 
> > In my setup I get
> >
> > component partitions, e.g. /dev/sda7: 39MB/s
> > raid device /dev/md2:                 31MB/s
> > lvm device /dev/main/media:           53MB/s
> >
> > (oldish system - but note that lvm device is *much* faster)
> 
> Did you test component device and raid device speed using the
> read-ahead settings tuned for lvm reads?  If so, that's not a fair
> comparison.  :-)
> 
> > For your entertainment you may like to try this to 'tune' your readahead
> > - it's OK to use so long as you're not recording:
> 
> Thanks, I played around with that a lot.  I tuned readahead to
> optimize lvm device reads, and this improved things greatly.  It turns
> out the default lvm settings had readahead set to 0!  But by tuning
> things, I could get my read speed up to 59MB/s.  This is with raw
> device readahead 256, md device readahead 1024 and lvm readahead 2048.
> (The speed was most sensitive to the last one, but did seem to depend
> on the other ones a bit too.)
> 
> I separately tuned the raid device read speed.  To maximize this, I
> needed to set the raw device readahead to 1024 and the raid device
> readahead to 4096.  This brought my raid read speed from 59MB/s to
> 78MB/s.  Better!  (But note that now this makes the lvm read speed
> look bad.)
> 
> My raw device read speed is independent of the readahead setting,
> as long as it is at least 256.  The speed is about 58MB/s.
> 
> Summary:
> 
> raw device:  58MB/s
> raid device: 78MB/s
> lvm device:  59MB/s
> 
> raid still isn't achieving the 106MB/s that I can get with parallel
> direct reads, but at least it's getting closer.
> 
> As a simple test, I wrote a program like dd that reads and discards
> 64k chunks of data from a device, but which skips 1 out of every four
> chunks (simulating skipping parity blocks).  It's not surprising that
> this program can only read from a raw device at about 75% the rate of
> dd, since the kernel readahead is probably causing the skipped blocks
> to be read anyways (or maybe because the disk head has to pass over
> those sections of the disk anyways).
> 
> I then ran four copies of this program in parallel, reading from the
> raw devices that make up my raid partition.  And, like md, they only
> achieved about 78MB/s.  This is very close to 75% of 106MB/s.  Again,
> not surprising, since I need to have raw device readahead turned on
> for this to be efficient at all, so 25% of the chunks that pass
> through the controller are ignored.
> 
> But I still don't understand why the md layer can't do better.  If I
> turn off readahead of the raw devices, and keep it for the raid
> device, then parity blocks should never be requested, so they
> shouldn't use any bus/controller bandwidth.  And even if each drive is
> only acting at 75% efficiency, the four drives should still be able to
> saturate the bus/controller.  So I can't figure out what's going on
> here.
when read, i do not think MD will read parity at all. but since parity
is on all disk, there might be a seek here. so you might want to try a
raid4 to see what happen as well.


> 
> Is there a way for me to simulate readahead in userspace, i.e. can
> I do lots of sequential asynchronous reads in parallel?
> 
> Also, is there a way to disable caching of reads?  Having to clear
> the cache by reading 900M each time slows down testing.  I guess
> I could reboot with mem=100M, but it'd be nice to disable/enable
> caching on the fly.  Hmm, maybe I can just run something like
> memtest which locks a bunch of ram...
after you run your code, check the meminfo, the cached value might be
much lower than u expected. my feeling is that linux page cache will
discard all cache if last file handle closed.


> 
> Thanks for all of the help so far!
> 
> Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html