Re: How to handle >16TB devices on 32 bit hosts ??

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jul 18, 2009  08:16 +0200, Andi Kleen wrote:
> Andreas Dilger <adilger@xxxxxxx> writes:
> > I think the point is that for those people who want to use > 16TB
> > devices on 32-bit platforms (e.g. embedded/appliance systems) the
> > choice is between "completely non-functional" and "uses a bit more
> > memory per page", and the answer is pretty obvious.
> 
> It's not just more memory per page, but also worse code all over the
> VM. long long 32bit code is generally rather bad, especially on
> register constrained x86.

If you aren't running a 32-bit system with this config, you shouldn't
really care.  For those systems that need to run in this mode they
would rather have it work a few percent slower instead of not at all.

> But I think the fsck problem is a show stopper here anyways.
> Enabling a setup that cannot handle IO errors wouldn't 
> be really a good idea.
> 
> In fact this problem already hits before 16TB on 32bit.

The e2fsck code is currently just starting to get > 16TB support,
and while the initial implementation is naive, we are definitely
planning on reducing the memory needed to check very large devices.

The last test numbers I saw were 5GB of RAM for a 20TB filesystem,
but since the bitmaps used are fully-allocated arrays that isn't
surprising.  We are planning to replace this with a tree, since the
majority of bitmaps used by e2fsck have large contiguous ranges of
set or unset bits and can be represented much more efficiently.

> Unless people rewrite fsck to use /dev/shm >4GB swapping
> (or perhaps use JFS which iirc had a way to use the file system
> itself as fsck scratch space)

I'm guessing that such systems won't have a 20TB boot device, but
rather a small flash boot/swap device (a few GB is cheap) and then
they could swap, if strictly necessary.

Also, for filesystems like btrfs or ZFS the checking can be done
online and incrementally without storing a full representation of
the state in memory.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux