Re: [PATCH] [RFC] xfs: filesystem expansion design documentation

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 29 Jul 2024 11:45:40 +1000

On Wed, Jul 24, 2024 at 05:38:46PM -0500, Eric Sandeen wrote:
> On 7/24/24 5:12 PM, Darrick J. Wong wrote:
> > On Wed, Jul 24, 2024 at 05:06:58PM -0500, Eric Sandeen wrote:
> >> On 7/21/24 6:01 PM, Dave Chinner wrote:
> >>> +The solution should already be obvious: we can exploit the sparseness of FSBNO
> >>> +addressing to allow AGs to grow to 1TB (maximum size) simply by configuring
> >>> +sb_agblklog appropriately at mkfs.xfs time. Hence if we have 16MB AGs (minimum
> >>> +size) and sb_agblklog = 30 (1TB max AG size), we can expand the AG size up
> >>> +to their maximum size before we start appending new AGs.
> >>
> >> there's a check in xfs_validate_sb_common() that tests whether sg_agblklog is
> >> really the next power of two from sb_agblklog:
> >>
> >> sbp->sb_agblklog != xfs_highbit32(sbp->sb_agblocks - 1) + 1
> >>
> >> so I think the proposed idea would require a feature flag, right?
> >>
> >> That might make it a little trickier as a drop-in replacement for cloud
> >> providers because these new expandable filesystem images would only work on
> >> kernels that understand the (trivial) new feature, unless I'm missing
> >> something.
> > 
> > agblklog and agblocks would both be set correctly if you did mkfs.xfs -d
> > agsize=1T on a small image; afaict it's only mkfs that cares that
> > dblocks >= agblocks.
> 
> Yes, in a single AG image like you've suggested.
> 
> In Dave's proposal, with multiple AGs, I think it would need to be handled.

So in looking into this in more detail, there is one thing I've
overlooked that will absolutely require a change of on-disk format:
we encode the physical location (daddr) of some types of metadata
into the metadata.

This was added so that we could detect misdirected reads or writes;
the metadata knows it's physical location and so checks that it
matches the physical location the buffer was read from or is being
written to.

This includes the btree block headers, symlink blocks and
directory/attr blocks, and I'm pretty sure that the locations
recorded by the rmapbt's are physical locations as well.

To allow expansion to physically relocate metadata, I think we're
going to have to change all of these to record fsbno rather than
daddr so that we can change the physical location of the metadata
without needing to change all the metadata location references. We
calculate daddr from FSBNO or AGBNO everywhere, so I think all the
info we need to do this already available in the code.

Daddrs and fsbnos are also the same size on disk (__be64) so there's
no change to metadata layouts or anything like that. It's just a
change of what that those fields store from physical to internal
sparse addressing.

Hence I don't think this is a showstopper issue - it will just
require verifiers and metadata header initialisation to use the
correct encoding at runtime based on the superblock feature bit.
It's not really that complex, just a bit more work that I originally
thought it would be.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx