On 9/5/20 6:42 PM, Wols Lists wrote: > I doubt I understand what you're getting at, but this is sounding a bit > like raid-4, if you have data disk(s) and a separate parity disk. People > don't use raid 4 because it has a nasty performance hit. Yes it is a bit like raid-4 since the data and parity disks are separated. In fact the idea could be better called a parity backed collection of independently accessed disks. While you would not get the advantage/performance increase of reads/writes going across multiple disks, the idea is primarily targeted to read-heavy applications, so in a typical use, read performance should be no worse than reading directly from a single un-raided disk, except in case of a disk failure where the parity is being used to calculated a block read on a missing disk. Writes would have more overhead since they would also have to calculate/update parity. > Personally, I'm looking at something like raid-61 as a project. That > would let you survive four disk failures ... Interesting. I'll check that out more later, but from what it seems so far there is a lot of overhead (10 1TB disks would only be 3TB of data (2x 5 disk arrays mirrors, then raid6 on each leaving 3 disks-worth of data). My currently solution since I'ts basically just storing bulk data, is mergerfs and snapraid, and from the documents of snapraid, 10 1TB disks would provide 6TB if using 4 for parity. However it's parity calculations seem to be more complex as well. > Also, one of the biggest problems when a disk fails and you have to > replace it is that, at present, with nearly all raid levels even if you > have lots of disks, rebuilding a failed disk is pretty much guaranteed > to hammer just one or two surviving disks, pushing them into failure if > they're at all dodgy. I'm also looking at finding some randomisation > algorithm that will smear the blocks out across all the disks, so that > rebuilding one disk spreads the load evenly across all disks. This is actually the main purpose of the idea. Due to the data on the disks in a traditional raid5/6 being mapped from multiple disks to a single logical block device, and so the structures of any file systems and their files scattered across all the disks, losing one more than the number of available lost disks would make the entire filesystem(s) and all files virtually unrecoverable. By keeping each data disk separate and exposed as it's own block device with some parity backup, each disk contains an entire filesystem(s) on it's own to be used however a user decides. The loss of one of the disks during a rebuild would not cause full data loss anymore but only of the filesystem(s) on that disk. The data on the other disks would still be intact and readable, although depending on the user's usage, may be missing files if they used a union/merge filesystem on top of them. A rebuild would still have the same issues, would have to read all the remaining disks to rebuild the lost disk. I'm not really sure of any way around that since parity would essentially be calculated as the xor of the same block on all the data disks. > > At the end of the day, if you think what you're doing is a good idea, > scratch that itch, bounce stuff off here (and the kernel newbies list if > you're not a kernel programmer yet), and see how it goes. Personally, I > don't think it'll fly, but I'm sure people here would say the same about > some of my pet ideas too. Give it a go! > > Cheers, > Wol