Re: Linux raid-like idea

Brian Allen Vanderburg II <brianvanderburg2@xxxxxxx> · Fri, 11 Sep 2020 11:14:37 -0400

On 9/5/20 6:42 PM, Wols Lists wrote:
> I doubt I understand what you're getting at, but this is sounding a bit
> like raid-4, if you have data disk(s) and a separate parity disk. People
> don't use raid 4 because it has a nasty performance hit.

Yes it is a bit like raid-4 since the data and parity disks are
separated.  In fact the idea could be better called a parity backed
collection of independently accessed disks. While you would not get the
advantage/performance increase of reads/writes going across multiple
disks, the idea is primarily targeted to read-heavy applications, so in
a typical use, read performance should be no worse than reading directly
from a single un-raided disk, except in case of a disk failure where the
parity is being used to calculated a block read on a missing disk. 
Writes would have more overhead since they would also have to
calculate/update parity.

> Personally, I'm looking at something like raid-61 as a project. That
> would let you survive four disk failures ...

Interesting.  I'll check that out more later, but from what it seems so
far there is a lot of overhead (10 1TB disks would only be 3TB of data
(2x 5 disk arrays mirrors, then raid6 on each leaving 3 disks-worth of
data).  My currently solution since I'ts basically just storing bulk
data, is mergerfs and snapraid, and from the documents of snapraid, 10
1TB disks would provide 6TB if using 4 for parity.  However it's parity
calculations seem to be more complex as well.

> Also, one of the biggest problems when a disk fails and you have to
> replace it is that, at present, with nearly all raid levels even if you
> have lots of disks, rebuilding a failed disk is pretty much guaranteed
> to hammer just one or two surviving disks, pushing them into failure if
> they're at all dodgy. I'm also looking at finding some randomisation
> algorithm that will smear the blocks out across all the disks, so that
> rebuilding one disk spreads the load evenly across all disks.

This is actually the main purpose of the idea.  Due to the data on the
disks in a traditional raid5/6 being mapped from multiple disks to a
single logical block device, and so the structures of any file systems
and their files scattered across all the disks, losing one more than the
number of available lost disks would make the entire filesystem(s) and
all files virtually unrecoverable.

By keeping each data disk separate and exposed as it's own block device
with some parity backup, each disk contains an entire filesystem(s) on
it's own to be used however a user decides.  The loss of one of the
disks during a rebuild would not cause full data loss anymore but only
of the filesystem(s) on that disk.  The data on the other disks would
still be intact and readable, although depending on the user's usage,
may be missing files if they used a union/merge filesystem on top of
them.  A rebuild would still have the same issues, would have to read
all the remaining disks to rebuild the lost disk.  I'm not really sure
of any way around that since parity would essentially be calculated as
the xor of the same block on all the data disks.

>
> At the end of the day, if you think what you're doing is a good idea,
> scratch that itch, bounce stuff off here (and the kernel newbies list if
> you're not a kernel programmer yet), and see how it goes. Personally, I
> don't think it'll fly, but I'm sure people here would say the same about
> some of my pet ideas too. Give it a go!
>
> Cheers,
> Wol