Re: ext4 64bit (disk >16TB) question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 15, 2008 at 07:42:01AM +0200, Goswin von Brederlow wrote:
> Is that a problem for the kernel or for the user space? I notices that
> mke2fs 1.39 used over a gigabyte memory to format a >16TiB disk. While
> being a lot that is not really a problem here.

Userspace.  The kernel demand-loads bitmap blocks as needed, but
e2fsprogs keeps bitarrays in user memory.  The problem is e2fsck; it
needs in the worst case something like 5 different blocks bitmaps and
3 or 4 inode bitmaps.  (I don't remember the exact numbers, but it's
that order of magnitude.)  So if it's something like a gigabyte of
memory for mke2fs, it might be 6-7 gigs of memory for e2fsck.  If this
is before swap has been enabled, it might not work at all, and even
with swap, we're talking serious slowdown if e2fsck is constantly
paging to disk.

> Will there be filesystem changes as well? The above mentioned
> run-length encoding sounds a bit like a new bitmap format or is that
> only supposed to be the in memory format in userspace?

No, it will only be a memory format in userspace.  And I anticipate
multiple backend storage formats for the bitmaps, depending on what
they will be used for.  For example, e2fsck uses one inode bitmap to
detect directory loops when following the parent '..' entry; this is a
super-sparse array, with at most N bits set in the entire array, where
N is the deepest directory in the filesystem.  Simply storing a sorted
list of bits that are "on" is the most efficient representation for
that particular bitmap.  Other bitmaps will be much better off stored
in memory using perhaps an extent of "on" bits in a red-black tree,
etc.  At least initially I will implement the "dumb and stupid" fixed
bitarray, but I need to make sure the we have the right dispatching to
support the rest.

> what is the plan of how to add 64-bit support to the shared lib now?
> Will you introduce a do_foo64() function in parallel to do_foo() to
> maintain abi compatibility? Will you add versioned symbols? Or will
> there be an abi break at some point?

There's a pretty good description of my plans here:

	http://thread.gmane.org/gmane.comp.file-systems.ext4/2845

So no versioned symbols, new functions where we go from
ext2fs_block_iterator2() to ext2fs_block_iterate3(), etc.  All new
interfaces that I have been adding have all been 64-bit clean to begin
with.  So for example all of the extents code use blk64_t.  The
io_manager has been switched over to support 64-bit block numbers,
etc.

> The reason I ask all this is because I'm willing to spend some time
> patching and testing. A single >16TiB filesystem instead of multiple
> smaller ones would be a great benefit for us.

Jose Santos has been working on some patches, and I've been working on
the 64-bit bitmap support (when I have time, which means it's been
sporadic).  My primary priority for ext4 has been on getting last
major bits of the patches into mainline and getting e2fsprogs 1.41 out
the door so that basic testing, bug fixing, and stablization could
begin.  We still have some bugs that need to squash, such as the
summary statistics and/or checksums in the block group descriptors
getting corrupted.  Nothing so far that can't be fixed with e2fsck,
but getting ext4 stable is just *much* higher priority for me right
now.

That being said, if you want to join the ext4 development efforts,
please subscribe to the linux-ext4@xxxxxxxxxxxxxxx mailing list
(standard majordomo subscription interface, like all of the kernel.org
lists).  The wiki at http://ext4.wiki.kernel.org has some good stuff,
but there's also stuff which is out of date there.  But stuff like the
ext4 irc channel is there, and the "getting started page" is
reasonably up to date.

Regards,

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux