On Thu, Feb 06, 2025 at 12:50:21PM -0800, Darrick J. Wong wrote: > On Thu, Feb 06, 2025 at 07:44:26AM +0100, Christoph Hellwig wrote: > > Currently each metabtree inode has it's own space reservation to ensure > > it can be expanded to the maximum size. This is not very efficient as it > > requires a large number of blocks to be set aside that can't be used at > > all by other btrees. > > > > Switch to a model that uses a global pool instead in preparation for > > reducing the amount of reserved space. > > ...because inodes draw from the global pool, so there's no reason to > compartmentalize each rt rmap btree's reservation like we do for per-AG > rmap btrees, right? Yes. > > - to_resv = ip->i_meta_resv_asked - (ip->i_nblocks + ip->i_delayed_blks); > > + to_resv = mp->m_metafile_resv_target - > > + (mp->m_metafile_resv_used + mp->m_metafile_resv_avail); > > This separates accounting of the usage of the metadata btree (e.g. > rtrmapbt) from i_blocks, which is a convenient way to fix an accounting > bug if ever there's a metadir inode with a big enough xattr structure > that it blows out into a real ondisk block. So far I don't think that > happens in practice, but it's not impossible. > Heh. I didn't think of that, but yes even if very theoretical at the moment. > Clearly, the mapping of before to after is: > > i_meta_resv_asked -> m_metafile_resv_target > i_nblocks -> m_metafile_resv_used > i_delayed_blks -> m_metafile_resv_avail > > Makes sense. And I guess we still consider the reservations as "delayed > allocations" in the sense that we subtract from fdblocks and put the > same quantity into m_delalloc_blks. Yes. > > @@ -932,13 +932,17 @@ xrep_reap_metadir_fsblocks( > > xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK); ... > > + /* > > + * Resize the reservations so that we don't fail to expand the btree. > > + */ > > + return xfs_metafile_resv_init(sc->mp); > > I'm not sure this fixes the reservation correctly -- let's say we have a > busted btree inode with a 100 fsblock reservation that was using 20 fsb. > At mount, we'll have called xfs_dec_fdblocks for 100 - 20 = 80 fsb: > > hidden_space = target - used; > xfs_dec_fdblocks(mp, hidden_space, true); > > Then we rebuild it with a more compact 18 fsb btree. Now we run > xfs_metafile_resv_init and it thinks it needs to call xfs_dec_fdblocks > for 100 - 18 = 82 fsb. But we already reserved 80 for this file, so the > correct thing to do here is to xfs_dec_fdblocks 2fsb from fdblocks, not > another 82. Hmm, yes. If only we had a good test for this :)