Re: Full use of varying drive sizes?---maybe a new raid mode is the answer?

Goswin von Brederlow <goswin-v-b@xxxxxx> · Mon, 05 Oct 2009 11:06:40 +0200

Konstantinos Skarlatos <k.skarlatos@xxxxxxxxx> writes:

> Goswin von Brederlow wrote:
>> Konstantinos Skarlatos <k.skarlatos@xxxxxxxxx> writes:
>>
>>
>>> Instead of doing all those things, I have a suggestion to make:
>>>
>>> Something that is like RAID 4 without striping.
>>>
>>> There are already 3 programs doing that, Unraid, Flexraid and
>>> disparity, but putting this functionality into linux-raid would be
>>> tremendous. (the first two work on linux and the third one is a
>>> command line windows program that works fine under wine).
>>>
>>> The basic idea is this: Take any number of drives, with any capacity
>>> and filesystem you like. Then provide the program with an empty disk
>>> at least as large as your largest disk. The program creates parity
>>> data by XORing together the disks sequentially block by block(or file
>>> by file), until it reaches the end of the smallest one.(It XORs block
>>> 1 of disk A with block1 of disk B, with block1 of disk C.... and
>>> writes the result to block1 of Parity disk) Then it continues with the
>>> rest of the drives, until it reaches the end of the last drive.
>>>
>>> Disk     A    B   C   D   E    P
>>> Block   1    1    1    1    1    1
>>> Block   2    2    2                2
>>> Block   3    3                      3
>>> Block   4                            4
>>>
>>> The great thing about this method is that when you lose one disk you
>>> can get all your data back. when you lose two disks you only lose the
>>> data on them, and not the whole array. New disks can be added and the
>>> parity recalculated by reading only the new disk and the parity disk.
>>>
>>
>> This has some problem though:
>>
>> 1) every wite is a read-modify-write
>>    Well, for one thing this is slow.
>>
> Is that necessary? Why not read every other data disk at the same time
> and calculate new parity blocks on the fly? granted, that would mean
> spinning up every disk, so maybe this mode could be an option?

Reading one parity block and updating it is faster than reading X data
blocks and recomputing the parity. Both in I/O and CPU terms.

>> 2) every write is a read-modify-write of the parity disk
>>    Even worse, all writes to independent disks bottleneck at the
>>    parity disk.
>> 3) every write is a read-modify-write of the parity disk    That
>> poor parity disk. It can never catch a break, untill it
>>    breaks. It is likely that it will break first.
>>
> No problem, a failed parity disk on this method is a much smaller
> problem than a failed disk on a RAID 5

But the reason they went from raid3/4 to raid5. :)

>> 4) if the parity disk is larger than the 2nd largest disk it will
>>    waste space
>> 5) data at the start of the disk is more likely to fail than at the
>>    end of a disk
>>    (Say disks A and D fail then Block A1 is lost but A2-A4 are still
>>    there)
>>
>> As for adding a new disks there are 2 cases:
>>
>> 1) adding a small disk
>>    zero out the new disk and then the parity does not need to be updated
>> 2) adding a large disk
>>    zero out the new disk and then that becomes the parity disk
>>
> So the new disk gets a copy of the parity data of the previous parity disk?

No, the old parity disk becomes a data disk that happens to initialy
contain the parity of A, B, C, D, E. The new parity disk becomes all
zero.

Look at it this way: XORing disk A, B, C, D, E together gives
P. XORing A, B, C, D, E, P together always gives 0. So by filling the
new parity disk with zero you are computing the parity of A, B, C, D,
E, P. Just more intelligently.

>>> Please consider adding this feature request, it would be a big plus
>>> for linux if such a functionality existed, bringing many users from
>>> WHS and ZFS here, as it especially caters to the needs of people that
>>> store video and their movie collection at their home server.
>>>
>>> Thanks for your time
>>>
>>>
>>> ABCDE for data drives, and P for parity
>>>
>>
>> As a side note I like the idea of not striping, despide the uneven
>> use. For home use the speed of a single disk is usualy sufficient but
>> the noise of concurrent access to multiple disks is bothersome.
> Have you tried the Seagate Barracuda LP's? totally silent! I have 8 of
> them and i can assure you that they are perfect for large media
> storage in a silent computer.

I buy when I need space and have the money (unfortunately that doesn't
always coincide) and i use what I have. But it is interesting to see
how newer disks are much quieter and I don't believe that is just age
making old disks louder.

>> Also
>> for movie archives a lot of access will be reading and then the parity
>> disk can rest. Disks can also be spun down more often. Only the disk
>> containing the movie one currently watches need to be spinning. That
>> could translate into real money saved on the electric bill.
>>
>>
> I agree this is something mainly for home use, where reads exceed
> writes by a large margin and when writes are done, they are done to
> one or two disks at the same time at most.
>> But I would still do this with my algorithm to get even amount of
>> redunancy. One can then use partitions or lvm to split the overall
>> raid device back into seperate drives if one wants to.
>>
> Yes I think that an option for merging the disks into a large one
> would be nice, as long as data is still recoverable from individual
> disks if for example 2 disks fail. One of the main advantages of not
> stripping is that when things go haywire some data is still
> recoverable, so please lets not lose that.

That is just a matter of placing the partitions/volumes to not span
multiple disks. With partitionable raids one could implement such a
raid mode that would combine all disks into a single raid device but
export each data disk back as a partition. One wouldn't be able to
repartition that though. So not sure if I would want that in the
driver layer. Not everyone will care about having each data disk
seperate.

MfG
        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html