Re: Max theoretical XFS filesystem size in review

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 8 Feb 2024 10:54:08 +1100

On Wed, Feb 07, 2024 at 02:26:53PM -0800, Luis Chamberlain wrote:
> I'd like to review the max theoretical XFS filesystem size and
> if block size used may affect this. At first I thought that the limit which
> seems to be documented on a few pages online of 16 EiB might reflect the
> current limitations [0], however I suspect its an artifact of both
> BLKGETSIZE64 limitation. There might be others so I welcome your feedback
> on other things as well.

The actual limit is 8EiB, not 16EiB. mkfs.xfs won't allow a
filesystem over 8EiB to be made.

> 
> As I see it the max filesystem size should be an artifact of:
> 
> max_num_ags * max_ag_blocks * block_size
> 
> Does that seem right?

max sector size, not max block size is the ultimate limitation.

Not really. Max filesystem size is also determined by compiler,
architecture, OS, tool and support constraints

> This is because the allocation group stores max number of addressable
> blocks in an allocation group, and this is in block of block size.  If
> we consider the max possible value for max_num_ags in light of the max
> number of addressable blocks which Linux can support, this is capped at
> the limit of blkdev_ioctl() BLKGETSIZE64, which gives us a 64-bit
> integer, so (2^64)-1, we do -1 as we start counting the first block at
> block 0.  That's 16 EiB (Exbibytes) and so we're capped at that in Linux
> regardless of filesystem.
> 
> Is that right?

We could actually support the full 64 bit device sector_t range (so
2^73 bytes), and we support file sizes up to 2^54 FSBs, so with 64kB
block sizes we are at 2^70 bytes per file. IOWs, we -could- go
larger than 8EiB, but....

> If we didn't have that limitation though, let's consider what else would
> be our cap.
> 
> max_num_ags depends on the actual max value possibly reported by the
> device divided by the maximum size of an AG in bytes. We have
> XFS_AG_MAX_BYTES which represents the maximum size of an AG in bytes.
> This is defined statically always as (longlong)BBSIZE << 31 and since
> BBSIZE is 9 this is about 1 TiB. So we cap one AG to have max 1 TiB.
> To get max_num_ags we divide the total capacity of the drive by
> this 1 TiB, so in Linux effectively today that max value should be
> 18,874,368.
>
> Is that right?

No.  It's (2^64 / 2^40) = 2^24 AGs (16.7 million), not (2^64 /
10^12) AGs.

Also, inode numbers only go up to 2^56, so once the AG count goes
above 2^24 we'd have to introduce a new allocator that to handle
inode/data locality in such large filesystems.

> Although we're probably far from needing a single storage addressable
> array needing more than 16 EiB for a single XFS filesystem, if the above was
> correct I was curious if anyone has more details about the caked in limit
> of 1 TiB limit per AG.

AGs are indexed by short btrees. i.e. they have 4 byte pointers to
minimise indexing space so are limited to indexing 2^31 blocks.

> Datatype wise though max_num_ags is the agcount in the superblock, we have
> xfs_agnumber_t sb_agcount and the xfs_agnumber_t is a uint32_t, so in theory
> we should be able to get this to 2^32 if we were OK to squeeze more data into
> one AG. And then the number of blocks in the ag is agf_length, another
> 32-bit value. With 4 KiB block size that's 65536 EiB, and on 16 KiB
> block size that's 262,144 Exbibytes (EiB) and so on.

Sure, in theory the XFS format *could* handle 2^80 bytes when we
have 64kB filesystem blocks. But we can't do that without massive
changes to the OS and filesystem implementation, so there's no point
in even talking about XFS support beyond 2^64 bytes until 128 bit
integer support is brought to the linux kernel and all our block
device and syscall interfaces are 128bit file offset capable....

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx