Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

Qu Wenruo <quwenruo.btrfs@xxxxxxx> · Thu, 16 Nov 2017 16:08:34 +0800

On 2017年11月16日 15:42, Nikolay Borisov wrote:
> 
> 
> On 16.11.2017 09:38, Qu Wenruo wrote:
>>
>>
>> On 2017年11月16日 14:54, Nikolay Borisov wrote:
>>>
>>>
>>> On 16.11.2017 04:18, Qu Wenruo wrote:
>>>> Hi all,
>>>>
>>>> [Background]
>>>> Recently I'm considering the possibility to use checksum from filesystem
>>>> to enhance device-mapper raid.
>>>>
>>>> The idea behind it is quite simple, since most modern filesystems have
>>>> checksum for their metadata, and even some (btrfs) have checksum for data.
>>>>
>>>> And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time
>>>> it can use the checksum to determine which copy is correct so it can
>>>> return the correct data even one copy get corrupted.
>>>>
>>>> [Objective]
>>>> The final objective is to allow device mapper to do the checksum
>>>> verification (and repair if possible).
>>>>
>>>> If only for verification, it's not much different from current endio
>>>> hook method used by most of the fs.
>>>> However if we can move the repair part from filesystem (well, only btrfs
>>>> supports it yet), it would benefit all fs.
>>>>
>>>> [What we have]
>>>> The nearest infrastructure I found in kernel is bio_integrity_payload.
>>>>
>>>> However I found it's bounded to device, as it's designed to support
>>>> SCSI/SATA integrity protocol.
>>>> While for such use case, it's more bounded to filesystem, as fs (or
>>>> higher layer dm device) is the source of integrity data, and device
>>>> (dm-raid) only do the verification and possible repair.
>>>>
>>>> I'm not sure if this is a good idea to reuse or abuse
>>>> bio_integrity_payload for this purpose.
>>>>
>>>> Should we use some new infrastructure or enhance existing
>>>> bio_integrity_payload?
>>>>
>>>> (Or is this a valid idea or just another crazy dream?)
>>>>
>>>
>>> This sounds good in principle, however I think there is one crucial
>>> point which needs to be considered:
>>>
>>> All fs with checksums store those checksums in some specific way, then
>>> when they fetch data from disk they they also know how to acquire the
>>> respective checksum.
>>
>> Just like integrity payload, we generate READ bio attached with checksum
>> hook function and checksum data.
> 
> So how is this checksum data acquired in the first place?

In btrfs case, through metadata read bio.
Since btrfs put data csum into its csum tree, as metadata.

Pass a READ bio with metadata specific verification function, and empty
verification data.

> 
>>
>> So for data read, we read checksum first and attach it to data READ bio,
>> then submit it.
>>
>> And for metadata read, in most case the checksum is integrated into
>> metadata header, like what we did in btrfs.
>>
>> In that case we attach empty checksum data to bio, but use metadata
>> specific function hook to handle it.
>>
>>> What you suggest might be doable but it will
>>> require lower layers (dm) be aware of how to acquire the specific
>>> checksum for some data.
>>
>> In above case, dm only needs to call the verification hook function.
>> If verification passed, that's good.
>> If not, try other copy if we have.
>>
>> In this case, I don't think dm layer needs any extra interface to
>> communicate with higher layer.
> 
> 
> Well that verification function is the interface I meant, you are
> communicating the checksum out of band essentially (notwithstanding the
> metadata case, since you said checksum is in the actual metadata header)
> 
> In the end - which problem are you trying to solve, allow for a generic
> checksumming layer which filesystems may use if they decide to ?

To make it clear, to allow device mapper layer to take use of filesystem
checksum (if they have) when there are multiple copies.

One problem of current dm raid1/10 (and possible raid5/6) is that they
don't have ability to know which copy is correct.
They can only handle device disappear.

Btrfs handles it by verifying data/metadata checksum.
While xfs/ext4 also has checksum for their metadata, why not allowing
device mapper to use such checksum to get the correct copy?

The mechanism is *NOT* a generic checksum layer.
How the csum is stored is determined by fs.
Just allow device mapper layer to be aware of this and make clever decision.

And more, this only affects READ bio, WRITE bio is not affected at all.
Csum calculation and storing is all handled by filesystem.
Device mapper layer won't need to get involved in that case.

And of course, btrfs can reuse this facility to do something bigger, but
that's another story.

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>> I don't think at this point there is such infra
>>> and frankly I cannot even envision how it will work elegantly. Sure you
>>> can create a dm-checksum target (which I believe dm-verity is very
>>> similar to) that stores checksums alongside data but at this point the
>>> fs is really out of the picture.
>>>
>>>
>>>> Thanks,
>>>> Qu
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Attachment:
signature.asc

Description: OpenPGP digital signature
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel