Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2017-11-16 20:30, Qu Wenruo wrote:


On 2017年11月17日 00:47, Austin S. Hemmelgarn wrote:


This is at least less complicated than dm-integrity.

Just a new hook for READ bio. And it can start from easy part.
Like starting from dm-raid1 and other fs support.
It's less complicated for end users (in theory, but cryptsetup devs are
working on that for dm-integrity), but significantly more complicated
for developers.

It also brings up the question of what happens when you want some other
layer between the filesystem and the MD/DM RAID layer (say, running
bcache or dm-cache on top of the RAID array).  In the case of
dm-integrity, that's not an issue because dm-integrity is entirely
self-contained, it doesn't depend on other layers beyond the standard
block interface.

Each layer can choose to drop the support for extra verification.

If the layer is not modifying the data, it can pass it do lower layer.
Just as integrity payload.
Which then makes things a bit more complicated in every other layer as well, in turn making things more complicated for all developers.


As I mentioned in my other reply on this thread, running with
dm-integrity _below_ the RAID layer instead of on top of it will provide
the same net effect, and in fact provide a stronger guarantee than what
you are proposing (because dm-integrity does real cryptographic
integrity verification, as opposed to just checking for bit-rot).

Although with more CPU usage for each device even they are containing
same data.
I never said it wasn't higher resource usage.



If your checksum is calculated and checked at FS level there is no added
value when you spread this logic to other layers.

That's why I'm moving the checking part to lower level, to make more
value from the checksum.


dm-integrity adds basic 'check-summing' to any filesystem without the
need to modify fs itself

Well, despite the fact that modern filesystem has already implemented
their metadata csum.

   - the paid price is - if there is bug between
passing data from  'fs' to dm-integrity'  it cannot be captured.

Advantage of having separated 'fs' and 'block' layer is in its
separation and simplicity at each level.

Totally agreed on this.

But the idea here should not bring that large impact (compared to big
things like ZFS/Btrfs).

1) It only affect READ bio
2) Every dm target can choose if to support or pass down the hook.
     no mean to support it for RAID0 for example.
     And for complex raid like RAID5/6 no need to support it from the very
     beginning.
3) Main part of the functionality is already implemented
     The core complexity contains 2 parts:
     a) checksum calculation and checking
        Modern fs is already doing this, at least for metadata.
     b) recovery
        dm targets already have this implemented for supported raid
        profile.
     All these are already implemented, just moving them to different
     timing is not bringing such big modification IIRC.

If you want integrated solution - you are simply looking for btrfs where
multiple layers are integrated together.

If with such verification hook (along with something extra to handle
scrub), btrfs chunk mapping can be re-implemented with device-mapper:

In fact btrfs logical space is just a dm-linear device, and each chunk
can be implemented by its corresponding dm-* module like:

dm-linear:       | btrfs chunk 1 | btrfs chunk 2 | ... | btrfs chunk n |
and
btrfs chunk 1: metadata, using dm-raid1 on diskA and diskB
btrfs chunk 2: data, using dm-raid0 on disk A B C D
...
btrfs chunk n: system, using dm-raid1 on disk A B

At least btrfs can take the advantage of the simplicity of separate
layers.

And other filesystem can get a little higher chance to recover its
metadata if built on dm-raid.
Again, just put dm-integrity below dm-raid.  The other filesystems
primarily have metadata checksums to catch data corruption, not repair
it,

Because they have no extra copy.
If they have, they will definitely use the extra copy to repair.
But they don't have those extra copies now, so that really becomes irrelevant as an argument (especially since it's not likely they will add data or metadata replication in the filesystem any time in the near future).

and I severely doubt that you will manage to convince developers to
add support in their filesystem (especially XFS) because:
1. It's a layering violation (yes, I know BTRFS is too, but that's a bit
less of an issue because it's a completely self-contained layering
violation, while this isn't).

If passing something along with bio is violating layers, then integrity
payload is already doing this for a long time.
The block integrity layer is also interfacing directly with hardware and _needs_ to pass that data down. Unless I'm mistaken, it also doesn't do any verification except in the filesystem layer, and doesn't pass down any complaints about the integrity of the data (it may try to re-read it, but that's not the same as what you're talking about).

2. There's no precedent in hardware (I challenge you to find a block
device that lets you respond to a read completing with 'Hey, this data
is bogus, give me the real data!').
3. You can get the same net effect with a higher guarantee of security
using dm-integrity.

With more CPU and IO overhead (journal mode will write data twice, one
for journal and one for real data).
If you're concerned about that, then the same argument could be made about having checksumming at all. Yes, it's not cheap, but security and data safety almost never are. CoW semantics in BTRFS are just as resource intensive (if not more so).




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux