Re: mdadm raid1 read performance

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Wed, 4 May 2011 20:57:00 -0300



2011/5/4 NeilBrown <neilb@xxxxxxx>:
> On Thu, 5 May 2011 00:08:59 +0100 Liam Kurmos <quantum.leaf@xxxxxxxxx> wrote:
>
>> Thanks to all who replied on this.
>>
>> I somewhat naively assumed that having 2 disks with the same data
>> would mean a similar read speed to raid0 should be the norm (and i
>> think this is a very popular miss-conception).
>> I was neglecting the seek time to skip alternate blocks which i guess
>> must the flaw.
>>
>> In theory though if i was reading a larger file, couldn't one disk
>> start reading at the beginning to a buffer and one start reading from
>> half way ( assuming 2 disks) and hence get close to 2x single d
>
> isk
>> speed?
>
> If you write your program to read from both the beginning and the middle
> then you might get double-speed.  The kernel doesn't know you are going to do
> this so the best it can do is read-ahead is large amounts.
>
> raid1 could notice large reads and send some to one disk and some to another,
> but the size for each device must be large enough that the time to seek over
> must be much less than the time to read, which is probably many megabytes on
> todays hardware - and raid1 has no way to know what that size is.
>
> Certainly it is possible that the read_balance code in md/raid1 could be
> improved.  As yet no-one has improved it and provided convincing performance
> numbers.

yes, it´s not a 10000% improvement, i got a max of 1% on a big test (1
hour of nonsequencial read), for ssd round robin allow a more use of
drives, and some improvements, while i don´t know how to get
hardware/software queue size, i couln´t improve code for select 'best'
disk: the disk that should return with less time, but benchmark
results was interesting since 1% was 1% three times (60minutes drop to
54minutes)

could be very interesting how to get information about disk and
automatic tune read balance
informations: acesstime (RPM information can help here), mb/s in a
sequencial search (depend RPM+disk size(1,8" 2,5" 3,5")+interface
(SATA1,SATA2,SAS) since SATA1 can´t allow more than 1,5Gb/s),
rotational/non rotational information
diference from rotational to non rotational:
roatitional: access time proportional to block distance (head arm /
disk position)
non rotaition: fixed accesstime with low variation


>> as a separate question, what should be the theoretical performance of raid5?
>
> x(N-1)
>
> So a 4 drive RAID5 should read at 3 time the speed of a single drive.
>
>>
>> in my tests i read 1GB and throw away the data.
>> dd if=/dev/md0 of=/dev/null bs=1M count=1000
>>
>> With 4 fairly fast hdd's i get
>
> Which apparently do 140MB/s:
>
>>
>> raid0: ~540MB/s
>
> I would expect 4*140 == 560, so this is a good result.
>
>> raid10: 220MB/s
>
> Assuming the default 'n2' layout, I would expect 2*140 or 280, so this is a
> little slow.  Try "--layout=f2" and see what you get (should be more like
> RAID0).
>
>> raid5: ~165MB/s
>
> I would expect 3*140 or 420, so this is very slow.  I wonder if read-ahead is
> set badly.
> Can you:
>   blockdev --getra /dev/md0
> multiply the number it gives you by 8 and give it back with
>   blockdev --setra NUMBER /dev/md0

very nice :)

>
>
>> raid1: ~140MB/s  (single disk speed)
>
> as expected.
>
>>
>> for 4 disks raid0 seems like suicide, but for my system drive the
>> speed advantage is so great im tempted to try it anyway and try and
>> use rsync to keep constant back up.
>
> If you have somewhere to rsync to, then you have more disks so RAID10 might
> be an answer... but I suspect you cannot move disks around that freely :-)
>
> NeilBrown
>
>
>
>>
>> cheers for you responses,
>>
>> Liam
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html