Re: [PATCH 3/3] e2fsprogs: Support for large inode migration.

"Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> · Thu, 26 Jul 2007 17:15:30 +0530

Theodore Tso wrote:
On Wed, Jul 25, 2007 at 11:06:28AM +0530, Aneesh Kumar K.V wrote:
From: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx>

Add new option -I <inode_size> to tune2fs.
This is used to change the inode size. The size
need to be multiple of 2 and we don't allow to
decrease the inode size.

As a part of increasing the inode size we throw
away the free inodes in the last block group. If
we can't we fail. In such case one can resize the
file system and then try to increase the inode size.

Let me guess, you're testing with a filesystem with two block groups,
right?  And to date you've tested *only* by doubling the size of the
inode.

I tested this with multiple ( 1 and 7 ) groups. But yes all the testing was to change
inode size from 128 to 256. 

What your patch does is is keep the number of inode blocks per block
group constant, so that the total number of inodes decreases by
whatever factor the inode size is increasing.  It's a cheap, dirty way
of doing the resizing, since it avoids needing to either (a) update
directory entries when inode numbers get renumbered, and (b) need to
update inodes when blocks need to get relocated in order to make room
for growing the inode table.

That is correct. What i was looking at was to get the dynamic inode location
first. That should help us to place large inode any where right ?. But i know
that is a long term goal since there is no patches for dynamic inode location.

I will work at increasing the inode table size as a part of increasing the inode
size. 

The problem with your patch is:

	* By shrinking the number of inodes, it can constrain the
          ability of the filesystem to create new files in the future.

I explained this in the commit log.

	* It ruins the inode and block placement algorithms where we
          try to keep inodes in the same block group as their parent
          directory, and we try to allocate blocks in the same block
          group as their containing inode.

I missed this in my analysis. So this means we may end up with bad performance
after resizing the inode. I will look at increasing the inode table size as a
part of increasing the inode size.

	* Because when the current patch makes no attempt to relocate
          inodes, and when it doubles the inode size, it chops the
          number of inodes in half, there must be no inodes in the
          last half of the inode table.  That is if there are N block
          groups, the inode tables in blockgroups N/2 to N-1 must be
          empty.  But because of the block group spreading algorithm,
          where new directories get pushed out to new block groups, in
          any real real-life filesystem, the use of block groups is
          evenly spread out, which means in practice you won't see
          case where the last half of the inodes will not be in use.
          Hence, your patch won't actually work in practice.

So unfortunately, the right answer *will* require expanding the inode
tables, and potentially moving blocks out of the way in order to make
room for it.  A lot of that machinery is in resize2fs, actually, and
I'm wondering if the right answer is to move resize2fs's functionality
into tune2fs.  We will also need this to be able to add the resize
inode after the fact.

That's not going to be a trivial set of changes; if you're looking for
something to test the undo manager, my suggestion would be to wire it
up into mke2fs and/or e2fsck first.  Mke2fs might be nice since it
will give us a recovery path in case someone screws up the arguments
to mkfs.  

I guess Undo I/O manager can go in because I have been using it for
the ext3 -> ext4 inode migration testing and for testing the above patch.

Why would one need to recover on mkfs. He can very well run mkfs again right ?

tune2fs use undo I/O manager when migrating to large
inode. This helps in reverting the changes if end results
are not correct.The environment variable TUNE2FS_SCRATCH_DIR
is used to indicate the  directory within which the tdb
file need to be created. The file will be named tune2fs-XXXXXX

My suggestion would be to use something like /var/lib/e2fsprogs as the
defalut directory.  And we should also do some tests to make sure
something sane happens if we run out of room for the undo file.
Presumably the only thing we can do is to abort the run and then back
out the chnages using what was written out to the undo file.

I had a FIXME!! in the code which stated it would be nice to use the  conf file
But right now the conffile is e2fsck specific

+	char *tdb_dir, tdb_file[PATH_MAX];
+#if 0 /* FIXME!! */
+	/*
+	 * Configuration via a conf file would be
+	 * nice
+	 */
+	profile_get_string(profile, "scratch_files",
+					"directory", 0, 0,
+					&tdb_dir);
+#endif
+	tdb_dir = getenv("TUNE2FS_SCRATCH_DIR");

-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html