Re: [RFC] TileFS - a proposal for scalable integrity checking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 09, 2007 at 11:59:23AM -0700, Valerie Henson wrote:
> On Wed, May 09, 2007 at 12:06:52PM -0500, Matt Mackall wrote:
> > On Wed, May 09, 2007 at 12:56:39AM -0700, Valerie Henson wrote:
> > > On Sun, Apr 29, 2007 at 08:40:42PM -0500, Matt Mackall wrote:
> > > > 
> > > > This does mean that our time to make progress on a check is bounded at
> > > > the top by the size of our largest file. If we have a degenerate
> > > > filesystem filled with a single file, this will in fact take as long
> > > > as a conventional fsck. If your filesystem has, say, 100 roughly
> > > > equally-sized files, you're back in Chunkfs territory.
> > > 
> > > Hm, I'm not sure that everyone understands, a particular subtlety of
> > > how the fsck algorithm works in chunkfs.  A lot of people seem to
> > > think that you need to check *all* cross-chunk links, every time an
> > > individual chunk is checked.  That's not the case; you only need to
> > > check the links that go into and out of the dirty chunk.  You also
> > > don't need to check the other parts of the file outside the chunk,
> > > except for perhaps reading the byte range info for each continuation
> > > node and making sure no two continuation inodes think they both have
> > > the same range, but you don't check the indirect blocks, block
> > > bitmaps, etc.
> > 
> > My reference to chunkfs here is simply that the worst-case is checking ~1
> > chunk, which is about 1/100th of a volume.
> 
> I understand that being the case if each file is only in one tile.
> Does the fpos make this irrelevant as well?

Fpos does make it irrelevant.
 
> > > > So we should have no trouble checking an exabyte-sized filesystem on a
> > > > 4MB box. Even if it has one exabyte-sized file! We check the first
> > > > tile, see that it points to our file, then iterate through that file,
> > > > checking that the forward and reverse pointers for each block match
> > > > and all CRCs match, etc. We cache the file's inode as clean, finish
> > > > checking anything else in the first tile, then mark it clean. When we get
> > > > to the next tile (and the next billion after that!), we notice that
> > > > each block points back to our cached inode and skip rechecking it.
> > > 
> > > If I understand correctly then, if you do have a one exabyte sized
> > > file, and any part of it is in a dirty tile, you will need to check
> > > the whole file?  Or will Joern's fpos proposal fix this?
> > 
> > Yes, the original idea is you have to check every file that "covers" a
> > tile in its entirety. With Joern's fpos piece, I think we can restrict
> > our checks to just the section of the file that covers the tile.
> 
> Hrm.  Can you help me understand how you would check i_size then?

That's pretty straightforward, I think. When we check an inode, we
have to check whether it has a block that corresponds with i_size, and
none beyond that.

That begs the question of when we check various pieces of data. It
seems the best time to check the various elements of an inode is when
we're checking the tile it lives on. This is when we'd check i_size,
that link counts made sense and that the ring of hardlinks was
correct, etc. 

We would also check that direct and indirect pointers were sensible
(ie pointing to data blocks on the disk). If so, we know we'll
eventually verify those pointers when we check the corresponding back
pointers from those blocks.

Directory checks are a bit more problematic. But I think we can
trigger a directory check each time we hit a tile data block that's
part of a directory. Keeping a small cache of checked directories will
keep this from being expensive.

We will, unfortunately, need to be able to check an entire directory
at once. There's no other efficient way to assure that there are no
duplicate names in a directory, for instance.

In summary, checking a tile requires trivial checks on all the inodes
and directories that point into a tile. Inodes, directories, and data
that are inside a tile get checked more thoroughly but still don't
need to do much pointer chasing.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux