Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3

Theodore Tso <tytso@xxxxxxx> · Fri, 9 Jun 2006 13:44:05 -0400

On Fri, Jun 09, 2006 at 09:54:49AM -0700, Linus Torvalds wrote:
> 
> 
> On Fri, 9 Jun 2006, Linus Torvalds wrote:
> > 
> > Just as an example: ext3 _sucks_ in many ways. It has huge inodes that 
> > take up way too much space in memory.
> 
> Btw, I'm not kidding you on this one.
> 
> THE NUMBER ONE MEMORY USAGE ON A LOT OF LOADS IS EXT3 INODES IN MEMORY!

To be fair, the bulk of the size of the size of the inode is is the
filesystem generic "struct inode", which is 480 bytes.  Ext3 just
includes the struct inode as part of its core data structure, which
makes the whole thing *look* big.  In fact, the ext3-specific part of
the in-core ext3 inode is only 188 bytes, for a total of 688 bytes for
the ext3_inode_info structure --- which is what you see in
/proc/slabinfo.   

Other filesystems store the struct inode via a pointer to a separately
allocated chunk of memory, which makes their in-core inode footprint
*look* smaller but that's just an illusion if the only place you look
is /proc/slabinfo.  For example, the xfs_inode_cache is 432 bytes, but
that's because struct inode is stored separately from xfs's
fake/pseudo "vfs inode" which it keeps around so the same code can be
used with Irix.  (It always amazes me that we allow this for XFS,
where when everywhere else we insist that that kind of cross-OS or
cross-version portability code is a fundamental violation of
CodingStyle which by increasing code bloat and making the code harder
to read and maintain by Linux developers, but that's a rant for
another day.)

Now, obviously I won't say that we can't do work to trim down
ext3_inode_info.  To be fair, reiserfs has an inode which is 576 bytes
long, so they only have 96 bytes of filesystem-specific information,
instead of the 188 bytes that we have in ext3.  So we can do look at
that, but remember that from the gross level, we're talking about 688
bytes per inode for ext3 compared to 576 bytes per inode for reiserfs
--- and at least 912 bytes per inode for xfs.

But I think you would agree that we would want to improve this number
"honestly", by trying to trim down actual memory structure use,
instead of just simply making the in-core data structure bushier so as
to hide the true size of the per-inode footprint from people looking
at /proc/slabinfo, right?  :-)

And in any case, this is why we have to think very carefully before
forking the codebase between ext3 and "ext4".  The work that we might
use to slim down ext4_inode_info would also have to be backported to
ext3_inode_info before ext3 users see the benefit.  And there may also
be bugs that now have to be fixed in _three_ separate codebases ---
ext2, ext3, and ext4.  To give another concrete example, adding
extents won't change the htree directory lookup code, so needlessly
having two copies of that htree code in the kernel would be a Bad
Idea(tm).  We've already on occasion found bugs that we had fixed in
ext3, but had forgotten to backport to ext2, and vice versa.  Adding a
third would triple our maintenance headache --- a similar reason why
we haven't started a 2.7 development tree yet, since we would have to
backport bug fixes back and forth between 2.6 and 2.7.

Not to say that forking ext3 to make a copy of the code that we call
"ext4" isn't automatically a bad idea to be dismissed out of hand,
just as someday that we might fork 2.6 and start a 2.7 development
branch.  But in both cases we need to think very hard about the
tradeoffs before we just go ahead and do it.

Regards,

						- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html