Re: RAID1 round robin read support in md?

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Mon, 5 Dec 2011 10:28:54 -0200

check this old topic... (kernel 2.6.37 i think)
http://www.spadim.com.br/raid1/
http://www.issociate.de/board/post/507493/raid1_new_read_balance,_first_test,_some_doubt,_can_anyone_help?.html

but you will not have a veeeery big read speed....
raid1 is for multi thread work
raid10 far is for less threads with more sequencial reads
check what is better to you

this code at top of email, implement a round robin, if you do each
read one device you will have a slow md device, but changing to
10,100,1000 reads to change device, is better (i think the problme is
cpu use or another problem... must check with iostat and others
statistics softwares)

2011/12/5 Doug Dumitru <doug@xxxxxxxxxx>
>
> What you are seeing is very SSD specific.
>
> With rotating media, it is very important to intentionally stay on one
> disk even if it leaves other mirrors quiet.  Rotating disks do "in the
> drive" read-ahead and take advantage of the heads being on the correct
> track, so streaming straight-line reads are efficient.
>
> With SSDs in an array, things are very different.  Drives don't really
> read ahead at all (actually they do, but this is more of a side effect
> of error correction than performance tuning, and the lengths are
> short).  If your application is spitting out 4MB read requests, they
> get cut into 512K (1024 sector) bio calls, and sent to a single drive
> if they are linear.  Because the code is optimized for HDDs, future
> linear calls should go to the same drive because an HDD is very likely
> to have at least some of the read sectors in the read-ahead cache.
>
> A different algorithm for SSDs would be better, but one concern is
> that this might slow down short read requests in a multi-threaded
> environment.  Actually managing a mix intelligently is probably best
> started with a Google literature search for SSD scheduling papers.  I
> suspect that UCSD's super-computing department might have done some
> work in this area.
>
> With the same data available from two drives, for low thread count
> applications, it might be better to actually cut up the inbound
> requests into even smaller chunks, and send them in parallel to the
> drives.  A quick test on a Crucial C300 shows the following transfer
> rates at different block sizes.
>
> 512K  319 MB/sec
> 256K  299 MB/sec
> 128K  298 MB/sec
> 64K  287 MB/sec
> 32K  275 MB/sec
>
> This is with a single 'dd' process and 'iflag=direct' bypassing linux
> read-ahead and buffer caching.  The test was only a second long or so,
> so the noise could be quite high.  Also, C300s may behave very
> differently with this workload than other drives, so you have to test
> each type of disk.
>
> What this implies is that if the md raid-1 layer "were to be" SSD
> aware, it should consider cutting up long requests and keeping all
> drives busy.  The logic would be something like:
>
> * If any request is >= 32K, split it into 'n' parts', and issue them
> in parallel.
>
> This would be best implemented "down low" in the md stack.
> Unfortunately, the queuing where requests are collated, happens below
> md completely (I think), so there is no easy point to insert this.
>
> The idea of round-robin scheduling the requests is probably a little
> off-base.  The important part is, with SSDs, to cut up the requests
> into smaller sizes, and push them in parallel.  A round-robin might
> trick the scheduler into this sometimes, but is probably only an
> edge-case solution.
>
> This same logic applies to raid-0, raid-5/6, and raid-10 arrays.  With
> HDDs is is often more efficient to keep the stripe size large so that
> individual in-drive read-ahead is exploited.  With SSDs, smaller
> stripes are often better (at least on reads) because it tends to keep
> all of the drive busy.
>
> Now it is important to note that this discussion is 100% about reads.
> SSD writes are a much more complicated animal.
>
> --
> Doug Dumitru
> EasyCo LLC
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html