Re: [PATCH 10/43] xfs: make metabtree reservations global

Christoph Hellwig <hch@xxxxxx> · Fri, 7 Feb 2025 05:18:48 +0100

On Thu, Feb 06, 2025 at 12:50:21PM -0800, Darrick J. Wong wrote:
> On Thu, Feb 06, 2025 at 07:44:26AM +0100, Christoph Hellwig wrote:
> > Currently each metabtree inode has it's own space reservation to ensure
> > it can be expanded to the maximum size.  This is not very efficient as it
> > requires a large number of blocks to be set aside that can't be used at
> > all by other btrees.
> > 
> > Switch to a model that uses a global pool instead in preparation for
> > reducing the amount of reserved space.
> 
> ...because inodes draw from the global pool, so there's no reason to
> compartmentalize each rt rmap btree's reservation like we do for per-AG
> rmap btrees, right?

Yes.

> > -	to_resv = ip->i_meta_resv_asked - (ip->i_nblocks + ip->i_delayed_blks);
> > +	to_resv = mp->m_metafile_resv_target -
> > +		(mp->m_metafile_resv_used + mp->m_metafile_resv_avail);
> 
> This separates accounting of the usage of the metadata btree (e.g.
> rtrmapbt) from i_blocks, which is a convenient way to fix an accounting
> bug if ever there's a metadir inode with a big enough xattr structure
> that it blows out into a real ondisk block. So far I don't think that
> happens in practice, but it's not impossible.
> 

Heh.  I didn't think of that, but yes even if very theoretical at the
moment.

> Clearly, the mapping of before to after is:
> 
> i_meta_resv_asked	-> m_metafile_resv_target
> i_nblocks		-> m_metafile_resv_used
> i_delayed_blks		-> m_metafile_resv_avail
> 
> Makes sense.  And I guess we still consider the reservations as "delayed
> allocations" in the sense that we subtract from fdblocks and put the
> same quantity into m_delalloc_blks.

Yes.

> > @@ -932,13 +932,17 @@ xrep_reap_metadir_fsblocks(
> >  	xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK);

...

> > +	/*
> > +	 * Resize the reservations so that we don't fail to expand the btree.
> > +	 */
> > +	return xfs_metafile_resv_init(sc->mp);
> 
> I'm not sure this fixes the reservation correctly -- let's say we have a
> busted btree inode with a 100 fsblock reservation that was using 20 fsb.
> At mount, we'll have called xfs_dec_fdblocks for 100 - 20 = 80 fsb:
> 
> 	hidden_space = target - used;
> 	xfs_dec_fdblocks(mp, hidden_space, true);
> 
> Then we rebuild it with a more compact 18 fsb btree.  Now we run
> xfs_metafile_resv_init and it thinks it needs to call xfs_dec_fdblocks
> for 100 - 18 = 82 fsb.  But we already reserved 80 for this file, so the
> correct thing to do here is to xfs_dec_fdblocks 2fsb from fdblocks, not
> another 82.

Hmm, yes.  If only we had a good test for this :)