On Wed, Jul 24, 2024 at 05:38:46PM -0500, Eric Sandeen wrote: > On 7/24/24 5:12 PM, Darrick J. Wong wrote: > > On Wed, Jul 24, 2024 at 05:06:58PM -0500, Eric Sandeen wrote: > >> On 7/21/24 6:01 PM, Dave Chinner wrote: > >>> +The solution should already be obvious: we can exploit the sparseness of FSBNO > >>> +addressing to allow AGs to grow to 1TB (maximum size) simply by configuring > >>> +sb_agblklog appropriately at mkfs.xfs time. Hence if we have 16MB AGs (minimum > >>> +size) and sb_agblklog = 30 (1TB max AG size), we can expand the AG size up > >>> +to their maximum size before we start appending new AGs. > >> > >> there's a check in xfs_validate_sb_common() that tests whether sg_agblklog is > >> really the next power of two from sb_agblklog: > >> > >> sbp->sb_agblklog != xfs_highbit32(sbp->sb_agblocks - 1) + 1 > >> > >> so I think the proposed idea would require a feature flag, right? > >> > >> That might make it a little trickier as a drop-in replacement for cloud > >> providers because these new expandable filesystem images would only work on > >> kernels that understand the (trivial) new feature, unless I'm missing > >> something. > > > > agblklog and agblocks would both be set correctly if you did mkfs.xfs -d > > agsize=1T on a small image; afaict it's only mkfs that cares that > > dblocks >= agblocks. > > Yes, in a single AG image like you've suggested. > > In Dave's proposal, with multiple AGs, I think it would need to be handled. So in looking into this in more detail, there is one thing I've overlooked that will absolutely require a change of on-disk format: we encode the physical location (daddr) of some types of metadata into the metadata. This was added so that we could detect misdirected reads or writes; the metadata knows it's physical location and so checks that it matches the physical location the buffer was read from or is being written to. This includes the btree block headers, symlink blocks and directory/attr blocks, and I'm pretty sure that the locations recorded by the rmapbt's are physical locations as well. To allow expansion to physically relocate metadata, I think we're going to have to change all of these to record fsbno rather than daddr so that we can change the physical location of the metadata without needing to change all the metadata location references. We calculate daddr from FSBNO or AGBNO everywhere, so I think all the info we need to do this already available in the code. Daddrs and fsbnos are also the same size on disk (__be64) so there's no change to metadata layouts or anything like that. It's just a change of what that those fields store from physical to internal sparse addressing. Hence I don't think this is a showstopper issue - it will just require verifiers and metadata header initialisation to use the correct encoding at runtime based on the superblock feature bit. It's not really that complex, just a bit more work that I originally thought it would be. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx