On Wed, Jul 24, 2024 at 05:38:46PM -0500, Eric Sandeen wrote: > On 7/24/24 5:12 PM, Darrick J. Wong wrote: > > On Wed, Jul 24, 2024 at 05:06:58PM -0500, Eric Sandeen wrote: > >> On 7/21/24 6:01 PM, Dave Chinner wrote: > >>> +The solution should already be obvious: we can exploit the sparseness of FSBNO > >>> +addressing to allow AGs to grow to 1TB (maximum size) simply by configuring > >>> +sb_agblklog appropriately at mkfs.xfs time. Hence if we have 16MB AGs (minimum > >>> +size) and sb_agblklog = 30 (1TB max AG size), we can expand the AG size up > >>> +to their maximum size before we start appending new AGs. > >> > >> there's a check in xfs_validate_sb_common() that tests whether sg_agblklog is > >> really the next power of two from sb_agblklog: > >> > >> sbp->sb_agblklog != xfs_highbit32(sbp->sb_agblocks - 1) + 1 > >> > >> so I think the proposed idea would require a feature flag, right? > >> > >> That might make it a little trickier as a drop-in replacement for cloud > >> providers because these new expandable filesystem images would only work on > >> kernels that understand the (trivial) new feature, unless I'm missing > >> something. Yes, I think that's the case. I don't see this as a showstopper - golden images that VMs get cloned from tend to be specific to the distro release they contain, so as long as both the orchestration nodes and the distro kernel within the image support the feature bit it will just work, yes? This essentially makes it an image build time decision to support the expand feature. i.e. if the fs in the image has the feature bit set, deployment can use expand, otherwise skip it and go straight to grow. > > agblklog and agblocks would both be set correctly if you did mkfs.xfs -d > > agsize=1T on a small image; afaict it's only mkfs that cares that > > dblocks >= agblocks. I don't think we can change sb_agblocks like this Fundamentally, sb_agblocks is the physical length of AGs, not the theoretical maximum size. Setting it to the maximum possible size fails in all sorts of ways for multiple AG filesystems (__xfs_ag_block_count() is the simplest example to point out), and even on single AG filesystems it will be problematic. sb->sb_agblklog is not a physical size - it is decoupled from the physical size of the AG so it can be used to calculate the address space locations of inodes and FSBNOs. Hence we can change sb_agblklog without changing sb_agblocks and other physical lengths because it doesn't change anything to do with the physical layout of the filesystem. > Yes, in a single AG image like you've suggested. I'm not so sure about that. We're not so lucky with code that validate against the AG header lengths or just use sb_agblocks blindly. e.g. for single AG fs's to work, they need to use __xfs_ag_block_count() everywhere to get the size of the AG and have it return sb_dblocks instead. We have code that assumes that sb_agblocks is the highest AGBNO possible in an AG and doesn't take into account that the size of the runt AG (e.g. xfs_alloc_ag_max_usable() is one of these). I've always understood there to be an implicit requirement for sb_agblocks and agf->agf_length/agi->agi_length to be the same and define the actual maximum valid block number in the AG. I don't think we can change that, and I don't think we should try to change this even for single AG filesystems. We don't even support single AG filesystems. And I don't think we should try to change such a fundamental physical layout definition for what would be a filesystem format that only physically exists temporarily in a cloud deployment context. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx