Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

Qu Wenruo <quwenruo.btrfs@xxxxxxx> · Fri, 17 Nov 2017 09:30:32 +0800

On 2017年11月17日 00:47, Austin S. Hemmelgarn wrote:

>>
>> This is at least less complicated than dm-integrity.
>>
>> Just a new hook for READ bio. And it can start from easy part.
>> Like starting from dm-raid1 and other fs support.
> It's less complicated for end users (in theory, but cryptsetup devs are
> working on that for dm-integrity), but significantly more complicated
> for developers.
> 
> It also brings up the question of what happens when you want some other
> layer between the filesystem and the MD/DM RAID layer (say, running
> bcache or dm-cache on top of the RAID array).  In the case of
> dm-integrity, that's not an issue because dm-integrity is entirely
> self-contained, it doesn't depend on other layers beyond the standard
> block interface.

Each layer can choose to drop the support for extra verification.

If the layer is not modifying the data, it can pass it do lower layer.
Just as integrity payload.

> 
> As I mentioned in my other reply on this thread, running with
> dm-integrity _below_ the RAID layer instead of on top of it will provide
> the same net effect, and in fact provide a stronger guarantee than what
> you are proposing (because dm-integrity does real cryptographic
> integrity verification, as opposed to just checking for bit-rot).

Although with more CPU usage for each device even they are containing
same data.

>>
>>>
>>> If your checksum is calculated and checked at FS level there is no added
>>> value when you spread this logic to other layers.
>>
>> That's why I'm moving the checking part to lower level, to make more
>> value from the checksum.
>>
>>>
>>> dm-integrity adds basic 'check-summing' to any filesystem without the
>>> need to modify fs itself
>>
>> Well, despite the fact that modern filesystem has already implemented
>> their metadata csum.
>>
>>   - the paid price is - if there is bug between
>>> passing data from  'fs' to dm-integrity'  it cannot be captured.
>>>
>>> Advantage of having separated 'fs' and 'block' layer is in its
>>> separation and simplicity at each level.
>>
>> Totally agreed on this.
>>
>> But the idea here should not bring that large impact (compared to big
>> things like ZFS/Btrfs).
>>
>> 1) It only affect READ bio
>> 2) Every dm target can choose if to support or pass down the hook.
>>     no mean to support it for RAID0 for example.
>>     And for complex raid like RAID5/6 no need to support it from the very
>>     beginning.
>> 3) Main part of the functionality is already implemented
>>     The core complexity contains 2 parts:
>>     a) checksum calculation and checking
>>        Modern fs is already doing this, at least for metadata.
>>     b) recovery
>>        dm targets already have this implemented for supported raid
>>        profile.
>>     All these are already implemented, just moving them to different
>>     timing is not bringing such big modification IIRC.
>>>
>>> If you want integrated solution - you are simply looking for btrfs where
>>> multiple layers are integrated together.
>>
>> If with such verification hook (along with something extra to handle
>> scrub), btrfs chunk mapping can be re-implemented with device-mapper:
>>
>> In fact btrfs logical space is just a dm-linear device, and each chunk
>> can be implemented by its corresponding dm-* module like:
>>
>> dm-linear:       | btrfs chunk 1 | btrfs chunk 2 | ... | btrfs chunk n |
>> and
>> btrfs chunk 1: metadata, using dm-raid1 on diskA and diskB
>> btrfs chunk 2: data, using dm-raid0 on disk A B C D
>> ...
>> btrfs chunk n: system, using dm-raid1 on disk A B
>>
>> At least btrfs can take the advantage of the simplicity of separate
>> layers.
>>
>> And other filesystem can get a little higher chance to recover its
>> metadata if built on dm-raid.
> Again, just put dm-integrity below dm-raid.  The other filesystems
> primarily have metadata checksums to catch data corruption, not repair
> it,

Because they have no extra copy.
If they have, they will definitely use the extra copy to repair.

> and I severely doubt that you will manage to convince developers to
> add support in their filesystem (especially XFS) because:
> 1. It's a layering violation (yes, I know BTRFS is too, but that's a bit
> less of an issue because it's a completely self-contained layering
> violation, while this isn't).

If passing something along with bio is violating layers, then integrity
payload is already doing this for a long time.

> 2. There's no precedent in hardware (I challenge you to find a block
> device that lets you respond to a read completing with 'Hey, this data
> is bogus, give me the real data!').
> 3. You can get the same net effect with a higher guarantee of security
> using dm-integrity.

With more CPU and IO overhead (journal mode will write data twice, one
for journal and one for real data).

Thanks,
Qu

>>
>> Thanks,
>> Qu
>>
>>>
>>> You are also possibly missing feature of dm-interity - it's not just
>>> giving you 'checksum' - it also makes you sure - device has proper
>>> content - you can't just 'replace block' even with proper checksum for a
>>> block somewhere in the middle of you device... and when joined with
>>> crypto - it makes it way more secure...
>>>
>>> Regards
>>>
>>> Zdenek
>>
> 

Attachment:
signature.asc

Description: OpenPGP digital signature