Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3

Andreas Dilger <adilger@xxxxxxxxxxxxx> · Fri, 9 Jun 2006 12:10:20 -0600

On Jun 09, 2006  09:25  -700, Linus Torvalds wrote:
> So two separate filesystems are _less_ to maintain than one big one. Even
> if there's a lot of code that -could- be shared.

That is true if people are willing to maintain both trees.  I think that
even with the current ext2/ext3 split there are continually fixes that are
missing from one filesystem or another.

> Just as an example: ext3 _sucks_ in many ways. It has huge inodes that
> take up way too much space in memory. It has absolutely disgusting code to
> handle directory reading and writing (buffer heads! In 2006!).

My point exactly!  The ext2 directory code was moved from buffer heads to
page cache by Al after ext3 was forked and the code was never fixed in ext3.

I don't see this getting any better if there is an ext4 filesystem and all
of the ext3 developers are only interested in maintaining ext4.  Look at
reiserfs - it is completely abandoned by Hans in favour of reiser4 (the
entry in MAINTAINERS notwithstanding) except for Chris Mason at SuSE.

Having a single codebase for everyone means that it is continually maintained
and users of ext3 aren't left out in the cold.

On Jun 09, 2006  09:54 -0700, Linus Torvalds wrote:
> Btw, I'm not kidding you on this one.
> 
> THE NUMBER ONE MEMORY USAGE ON A LOT OF LOADS IS EXT3 INODES IN MEMORY!

Do you think that would be any different with a new filesystem?

> And you know what? 2TB files are totally uninteresting to 99.9999% of all 
> people. Most people find it _much_ more interesting to have hundreds of 
> thousands of _smaller_ files instead.
> 
> So do this:
> 
> 	cat /proc/slabinfo | grep ext3

# head -2 /proc/slabinfo
slabinfo - version: 2.1
name       <active_objs> <num_objs> <objsize> <objperslab>

# grep ext2 /proc/slabinfo
ext2_inode_cache       0          0       572            7
ext2_xattr             0          0        48           81

# grep ext3 /proc/slabinfo

ext3_inode_cache   30207      41418       616            6
ext3_xattr             0          0        48           81

# grep xfs /proc/slabinfo
xfs_ili             2558       2576       140           28
xfs_inode           2558       2565       448            9

# grep jfs /proc/slabinfo
jfs_ip                 0          0      1048            3

So, the ext3 inode could grow another ~50 bytes without changing the
slab allocation size ;-), and in fact other filesystem aren't noticably
different.

> and be absolutely disgusted and horrified by the size of those inodes 
> already, and ask yourself whether extending the block size to 48 bits will 
> help or further hurt one of the biggest problems of ext3 right now?

This is then the biggest problem of all filesystems.

> (And yes, I realize that block numbers are just a small part of it. The 
> "vfs_inode" is also a real problem - it's got _way_ too many large 
> list-heads that explode on a 64-bit kernel, for example. Oh, well.

On a 32-bit system the vfs_inode is more than half of the size of the ext3
inode, it is worse on 64-bit systems.

> My point is that things like this can make a very real issue _worse_ for all 
> the people who don't care one whit about it)

The current group of changes will be a no-op if CONFIG_LBD isn't enabled,
and I think I argued fairly strongly to also have a CONFIG_ flag to allow
larger than 2TB file support only for those users that want it.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html