Re: [RFC] TileFS - a proposal for scalable integrity checking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 29, 2007 at 02:21:13PM +0200, Jörn Engel wrote:
> On Sat, 28 April 2007 17:05:22 -0500, Matt Mackall wrote:
> > 
> > This is a relatively simple scheme for making a filesystem with
> > incremental online consistency checks of both data and metadata.
> > Overhead can be well under 1% disk space and CPU overhead may also be
> > very small, while greatly improving filesystem integrity.
> 
> I like it a lot.  Until now it appears to solve more problems and cause
> fewer new problems than ChunkFS.

Thanks. I think this is a bit more direct solution than ChunkFS, but
a) I haven't followed ChunkFS closely and b) I haven't been thinking
about fsck very long, so this is still just a presented as fodder for
discussion.

> > [...]
> > 
> >  Divide disk into a bunch of tiles. For each tile, allocate a one
> >  block tile header that contains (inode, checksum) pairs for each
> >  block in the tile. Unused blocks get marked inode -1, filesystem
> >  metadata blocks -2. The first element contains a last-clean
> >  timestamp, a clean flag and a checksum for the block itself. For 4K
> >  blocks with 32-bit inode and CRC, that's 512 blocks per tile (2MB),
> >  with ~.2% overhead.
> 
> You should add a 64bit fpos field.  That allows you to easily check for
> addressing errors.

Describe the scenario where this manifests, please.

It just occurred to me that my approach is analogous to object-based
rmap on the filesystem. The fpos proposal I think makes it more like
the original per-pte rmap. This is not to say I think the same lessons
apply, as I'm not clear what you're proposing yet.

Ooh.. I also just realized the tile approach allows much easier
defragging/shrinking of large filesystems because finding the
associated inode for blocks you want to move is fast.

> >  [Note that CRCs are optional so we can cut the overhead in half. I
> >  choose CRCs here because they're capable of catching the vast
> >  majority of accidental corruptions at a small cost and mostly serve
> >  to protect against errors not caught by on-disk ECC (eg cable noise,
> >  kernel bugs, cosmic rays). Replacing CRCs with a stronger hash like
> >  SHA-n is perfectly doable.]
> 
> The checksum cannot protect against a maliciously prepared medium
> anyway, so crypto makes little sense.

In a past life, I wrote a device mapper layer that kept a
cryptographic hash per cluster of the underlying device, with a
top-level digital signature of said hashes. That gets you pretty
close to tamper-proof, in theory. Practice of course is a different
matter, so don't try this at home.

As it happens, this earlier system was the inspiration for the tile
idea, the integrity parts of which have been kicking around in my head
since I heard ZFS was tracking checksums.

> Crc32 can provably (if you trust those who did the proof) detect all
> 1, 2 and 3-bit errors and has a 1:2^32 chance of detecting any
> remaining errors. That is fairly hard to improve on.

Indeed.
 
> >  Every time we write to a tile, we must mark the tile dirty. To cut
> >  down time to find dirty tiles, the clean bits can be collected into a
> >  smaller set of blocks, one clean bitmap block per 64GB of data.
> 
> You can and possibly should organize this as a tree, similar to a file.
> One bit at the lowest level marks a tile as dirty.  One bit for each
> indirect block pointer marks some tiles behind the pointer as dirty.
> That scales logarithmically to any filesystem size.

Right. 3 levels takes us to 512TB, etc..

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux