Re: The huge different performance of sequential read between RAID0 and RAID5

Goswin von Brederlow <goswin-v-b@xxxxxx> · Fri, 29 Jan 2010 12:53:24 +0100

Michael Evans <mjevans1983@xxxxxxxxx> writes:

> On Thu, Jan 28, 2010 at 7:27 AM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
>> On Thu Jan 28, 2010 at 09:55:05AM -0500, Yuehai Xu wrote:
>>
>>> 2010/1/28 Gabor Gombas <gombasg@xxxxxxxxx>:
>>> > On Thu, Jan 28, 2010 at 09:31:23AM -0500, Yuehai Xu wrote:
>>> >
>>> >> >> md0 : active raid5 sdh1[7] sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
>>> >> >>       631353600 blocks level 5, 64k chunk, algorithm 2 [7/6] [UUUUUU_]
>>> > [...]
>>> >
>>> >> I don't think any of my drive fail because there is no "F" in my
>>> >> /proc/mdstat output
>>> >
>>> > It's not failed, it's simply missing. Either it was unavailable when the
>>> > array was assembled, or you've explicitely created/assembled the array
>>> > with a missing drive.
>>>
>>> I noticed that, thanks! Is it usual that at the beginning of each
>>> setup, there is one missing drive?
>>>
>> Yes - in order to make the array available as quickly as possible, it is
>> initially created as a degraded array.  The recovery is then run to
>> add in the extra disk.  Otherwise all disks would need to be written
>> before the array became available.
>>
>>> >
>>> >> How do you know my RAID5 array has one drive missing?
>>> >
>>> > Look at the above output: there are just 6 of the 7 drives available,
>>> > and the underscore also means a missing drive.
>>> >
>>> >> I tried to setup RAID5 with 5 disks, 3 disks, after each setup,
>>> >> recovery has always been done.
>>> >
>>> > Of course.
>>> >
>>> >> However, if I format my md0 with such command:
>>> >> mkfs.ext3 -b 4096 -E stride=16 -E stripe-width=*** /dev/XXXX, the
>>> >> performance for RAID5 becomes usual, at about 200~300M/s.
>>> >
>>> > I suppose in that case you had all the disks present in the array.
>>>
>>> Yes, I did my test after the recovery, in that case, does the "missing
>>> drive" hurt the performance?
>>>
>> If you had a missing drive in the array when running the test, then this
>> would definitely affect the performance (as the array would need to do
>> parity calculations for most stripes).  However, as you've not actually
>> given the /proc/mdstat output for the array post-recovery then I don't
>> know whether or not this was the case.
>>
>> Generally, I wouldn't expect the RAID5 array to be that much slower than
>> a RAID0.  You'd best check that the various parameters (chunk size,
>> stripe cache size, readahead, etc) are the same for both arrays, as
>> these can have a major impact on performance.
>>
>> Cheers,
>>    Robin
>> --
>>     ___
>>    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
>>   / / )      | Little Jim says ....                            |
>>  // !!       |      "He fallen in de water !!"                 |
>>
>
> A more valid test that could be run would follow:
>
> Assemble all the test drives as a raid-5 array (you can zero the
> drives any way you like and then --assume-clean if they really are all
> zeros) and let the resync complete.
>
> Run any tests you like.
>
> Stop and --zero-superblock on the array.
>
> Create a striped array (raid 0) using all but one of the test drives.
>
> Since you dropped the drive's worth of storage that would be dedicated
> to parity in the raid-5 setup you're now benchmarking the same number
> of /data/ storage drives; but have saved one drive's worth of recovery
> data (at cost of risking your data if any single drive fails).
>
> Still, run the same benchmarks.
>
> Why is this valid instead of throwing all the drives at it in raid-0
> mode as well?  It provides the same resulting storage size.
>
>
> What I suspect you'll find is very similar read performance and
> measurably, though perhaps tolerable, worse write performance from
> raid-5.

In raid5 mode each drive will read 5*64k data and then skip 64k and
repeat. And skipping such a small chunk of data means waiting till it
has rotated below the head. So each drive only gives 5/6th of its linear
speed. As a result the 6 disks raid5 should be 5/6th of the speed of a 5
disk raid0 assuming the controler and bus are fast enough.

A larger chunk size can mean skipping the parity chunk skips a
cylinder. But larger chunk size makes it less likely reads are spread
over all/multiple disks. So you might loose more than you gain.

MfG
        Goswin

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html