Re: Propose of enhancement of raid1 driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2006-10-19 at 13:28 +1000, Neil Brown wrote:
> On Tuesday October 17, mirek@xxxxxxxxxxxxxxxx wrote:
> > I would like to propose an enhancement of raid 1 driver in linux kernel.
> > The enhancement would be speedup of data reading on mirrored partitions.
> > The idea is easy.
> > If we have mirrored partition over 2 disks, and these disk are in sync, there is
> > possibility of simultaneous reading of the data from both disks on the same way
> > as in raid 0. So it would be chunk1 read from master, chunk2 read from slave at
> > the same time. 
> > As result it would give significant speedup of read operation (comparable with
> > speed of raid 0 disks).
> 
> This is not as easy as it sounds.
> Skipping over blocks within a track is no faster than reading blocks
> in the track, so you would need to make sure that your chunk size is
> larger than one track - probably it would need to be several tracks.
> 
> Raid1 already does some read-balancing, though it is possible (even
> likely) that it doesn't balance very effectively.  Working out how
> best to do the balancing in general in a non-trivial task, but would
> be worth spending time on.
> 
> The raid10 module in linux supports a layout described as 'far=2'.
> In this layout, with two drives, the first half of the drives is used
> for a raid0, and the second half is used for a mirrored raid0 with the
> data on the other disk.
> In this layout reads should certainly go at raid0 speeds, though
> there is cost in the speed of writes.
> 
> Maybe you would like to experiment.  Write a program that reads from
> two drives in parallel, reading all the 'odd' chunks from one drive
> and the 'even' chunks from the other, and find out how fast it is.
> Maybe you could get it to try lots of different chunk sizes and see
> which is the fastest.

Too artificial.  The results of this sort of test would not translate
well to real world usage.

> That might be quite helpful in understanding how to get read-balancing
> working well.

Doing *good* read balancing is hard, especially given things like FC
attached storage, iSCSI/iSER, etc.  If I wanted to do this right, I'd
start by teaching the md code to look more deeply into block devices,
possibly even with a self tuning series of reads at startup to test
things like close seek sequential operation times versus maximum seek
throughput which would clue you in as to whether the device you are
talking to might have more than 1 physical spindle which would impact
the cost you associate to seek requiring operations relative to
bandwidth heavy operations, I might even go so far as to look into the
SCSI transport classes for clues about data throughput at bus bandwidth
versus command startup/teardown costs on the bus so you have an accurate
idea if lots of outstanding small commands are likely to cause your
device to suffer bus starvation issues from overhead.  Then I'd use that
data to help me numerically quantify the load on a device, updated both
when a command is added to the block layer queue (the queued load) and
when the command is actually removed from the block queue and sent to
the device (the active load) and updated again when the command is
received back.  Then, I'd basically look at what an incoming command
*would* do to each constituent disk's load values to see whether it
should go to one or the other.  But, that's just off the top of my head
and I may be on crack...I didn't check what my wife handed me this
morning.

-- 
Doug Ledford <dledford@xxxxxxxxxx>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux