Re: Split RAID: Proposal for archival RAID using incremental batch checksum

Anshuman Aggarwal <anshuman.aggarwal@xxxxxxxxx> · Thu, 30 Oct 2014 20:30:40 +0530

Would chunksize==disksize work? Wouldn't that lead to the entire
parity be invalidated for any write to any of the disks (assuming md
operates at a chunk level)...also please see my reply below

On 29 October 2014 14:55, Anshuman Aggarwal <anshuman.aggarwal@xxxxxxxxx> wrote:
> Right on most counts but please see comments below.
>
> On 29 October 2014 14:35, NeilBrown <neilb@xxxxxxx> wrote:
>> Just to be sure I understand, you would have N + X devices.  Each of the N
>> devices contains an independent filesystem and could be accessed directly if
>> needed.  Each of the X devices contains some codes so that if at most X
>> devices in total died, you would still be able to recover all of the data.
>> If more than X devices failed, you would still get complete data from the
>> working devices.
>>
>> Every update would only write to the particular N device on which it is
>> relevant, and  all of the X devices.  So N needs to be quite a bit bigger
>> than X for the spin-down to be really worth it.
>>
>> Am I right so far?
>
> Perfectly right so far. I typically have a N to X ratio of 4 (4
> devices to 1 data) so spin down is totally worth it for data
> protection but more on that below.
>
>>
>> For some reason the writes to X are delayed...  I don't really understand
>> that part.
>
> This delay is basically designed around archival devices which are
> rarely read from and even more rarely written to. By delaying writes
> on 2 criteria ( designated cache buffer filling up or preset time
> duration from last write expiring) we can significantly reduce the
> writes on the parity device. This assumes that we are ok to lose a
> movie or two in case the parity disk is not totally up to date but are
> more interested in device longevity.
>
>>
>> Sounds like multi-parity RAID6 with no parity rotation and
>>   chunksize == devicesize
> RAID6 would present us with a joint device and currently only allows
> writes to that directly, yes? Any writes will be striped.
> In any case would md raid allow the underlying device to be written to
> directly? Also how would it know that the device has been written to
> and hence parity has to be updated? What about the superblock which
> the FS would not know about?
>
> Also except for the delayed checksum writing part which would be
> significant if one of the objectives is to reduce the amount of
> writes. Can we delay that in the code currently for RAID6? I
> understand the objective of RAID6 is to ensure data recovery and we
> are looking at a compromise in this case.
>
> If feasible, this can be an enhancement to MD RAID as well where N
> devices are presented instead of a single joint device in case of
> raid6 (maybe the multi part device can be individual disks?)
>
> It will certainly solve my problem of where to store the metadata. I
> was currently hoping to just store it as a configuration file to be
> read by the initramfs since in this case worst case scenario the
> checksum goes out of sync and is rebuilt from scratch.
>
>>
>> I wouldn't use device-mapper myself, but you are unlikely to get an entirely
>> impartial opinion from me on that topic.
>
> I haven't hacked around the kernel internals much so far so will have
> to dig out that history. I will welcome any particular links/mail
> threads I should look at for guidance (with both yours and opposing
> points of view)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html