Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jun 11, 2006  18:00 +0200, Arjan van de Ven wrote:
> On Fri, 2006-06-09 at 21:44 +0100, Alan Cox wrote:
> > OTOH the number of complaints about this is minimal, people want to go
> > forwards in a controlled manner not backwards.
> 
> well... they want to be able to go "a little bit" backwards; say one
> version of an OS (6 months). Eg the scenario that ought to work is "go
> to newer version, hate it, go back". But yes that's a limited time to go
> back, not the "go back to 2.2" kind of "go back".

Interestingly, one of the reasons we want(ed) to get the extents code into
the ext3 mainline ASAP is that this would allow it to be available for the
"go back" phase when (in a couple of years) you NEED to have support for
gigantic block devices and have no choice but use this code to update.
For today it would only be used by people who really want to use it.

On Jun 11, 2006  18:02 +0200, Arjan van de Ven wrote:
> On Fri, 2006-06-09 at 14:51 -0400, Jeff Garzik wrote:
> > PRECISELY.  So you should stop modifying a filesystem whose design is 
> > admittedly _not_ modern!
> > 
> > ext3 is already essentially xiafs-on-life-support, when you consider 
> > today's large storage systems and today's filesystem technology.  Just 
> > look at the ugly hacks needed to support expanding an ext3 filesystem 
> > online.
> 
> actually I think I disagree with you. One thing I've noticed over the
> years is that ext2 layout has one thing going for it: it is simple and
> robust. Maybe "ext2 layout" is the wrong word, "block bitmap and
> direct/indirect block based" may be better. It seems that once you go
> into tree space (and I would call htree a borderline thing there) you
> get both really complex code and fragile behavior all over (mostly in
> terms of "when something goes wrong")

You're correct in calling htree a borderline case, because the directory
metadata is still accessible in a "linear" manner if the tree is corrupted
for some reason.  I've recently been thinking of making the structure even
more robust by encoding a singly- or doubly-linked list into the directory
leaf blocks.

However, in the direct/indirect block tree is the most fragile part of
ext2/ext3.  It also has the bad effect that corruption in the file indirect
tree can easily amplify into widespread filesystem corruption because wrongly
freeing indirect block and reallocating it will potentially cause 1024 more
blocks to be freed when that indirect block is unlinked, etc.  This is also
the slowest part of e2fsck checking if it detects corruption (duplication)
in the block allocation.

When we had very small filesystems it was easy to tell if an
indirect block was corrupt, because the valid block numbers made up only
a small fraction of the 2^32 possible block numbers.  However, with large
filesystems valid block numbers make up a large fraction of the 2^32 block
number space.  As we get to 16TB filesystems it is impossible to tell when
an indirect block is filled with garbage and when it is valid.

One of the features of the extent format is that firstly it has a magic
number in each "indirect" block (called an extent index block).  Secondly,
there is enough redundancy that it allows internal validation of the extent
data (e.g. that extents are sequentially increasing logical offsets, that
the parent's logical offset is correctly "encompassing" all of the leaf's
logical offsets.

Finally, one of the features that has been designed into the extent format
(though not yet implemented) is that it is possible to add a checksum to
each extent index to verify the metadata more strongly.  There will also
be space to have a back-pointer to the parent inode for validation.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux