Re: [RFC] dynamic inodes

Andreas Dilger <adilger@xxxxxxx> · Fri, 26 Sep 2008 14:18:32 -0600

On Sep 26, 2008  10:33 -0400, Theodore Ts'o wrote:
> > We could special-case the placement of the GDT blocks in this case, and
> > then put them into the proper META_BG location when/if the blocks are
> > actually added to the filesystem.
> 
> Yes, but where do you put the GDT blocks in the case of where there is
> no more space in the reserved gdt blocks?  Using some inode is
> probably the best bet, since we would then know where to find the GDT
> blocks.

I agree that replicating a GDT inode is probably the easiest answer.
IIRC this was proposed also in the past, before META_BG was implemented.
To be honest, we should just deprecate META_BG at that time, I don't
think it was every used by anything, and still isn't properly handled
by the cross-product of filesystem features (online resize, others).

> My suggestion of using inode numbers growing downward from the top of
> the 2**32 number space was to avoid needing to move the GDT blocks
> into their proper place if and when the filesystem is grown;

How do inode numbers affect the GDT blocks?  Is it because high inode
numbers would be in correspondingly high "groups" and resizing could
be done "normally" without affecting the new GDT placement?

Once we move over to a scheme of GDT inodes, there isn't necessarily a
"proper place" for GDT blocks, so I don't know if that makes a difference.

I was going to object on the grounds that the GDT inodes will become too
large and sparse, but for a "normal" ratio (8192 inodes/group) this
only works out to be 32MB for the whole gdt to hit 2^32 inodes.

The other thing we should consider is the case where the inode ratio
is too high, and it is limiting the growth of the filesystem due to
2^32 inode limit.  With a default inode ratio of 1 inode/8192 bytes,
this hits 2^32 inodes at 262144 groups, or only 32TB...  We may need
to also be able to add "inodeless groups" in such systems unless we
also implement 2^64-bit inode numbers at the same time.

This isn't impossible, though the directory format would need to change
to handle 64-bit inode numbers, and some way to convert between the
leaf formats.

> it simplifies the code needed for the on-line resizing, and it also means
> that when you do the on-line resizing the filesystem gets more inodes
> if the inodes are dynamically grown automatically by the filesystem,
> maybe that's not a problem.

It probably makes sense to increase the "static" inode count proportionally
with the new blocks, since we already know the inode ratio is too small,
so I can see a benefit from this direction.

> > Alternately, we could put the GDT into the inode and replicate the whole
> > inode several times (the data would already be present in the filesystem).
> > We just need to select inodes from disparate parts of the filesystem to
> > avoid corruption (I'd suggest one inode from each backup superblock
> > group), point them at the existing GDT blocks, then allow the new GDT
> > blocks to be added to each one.  The backup GDT-inode copies only need
> > to be changed when new groups are added/removed.
> 
> Yes, that's probably the best solution, IMHO.
> 
> 					- Ted

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html