On Mon, Aug 26, 2024 at 12:40:28PM -0700, Darrick J. Wong wrote: > On Mon, Aug 26, 2024 at 02:56:37PM +1000, Dave Chinner wrote: > > On Thu, Aug 22, 2024 at 05:26:38PM -0700, Darrick J. Wong wrote: > > > From: Christoph Hellwig <hch@xxxxxx> > > > > > > Make the allocator rtgroup aware by either picking a specific group if > > > there is a hint, or loop over all groups otherwise. A simple rotor is > > > provided to pick the placement for initial allocations. > > > > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > --- > > > fs/xfs/libxfs/xfs_bmap.c | 13 +++++- > > > fs/xfs/libxfs/xfs_rtbitmap.c | 6 ++- > > > fs/xfs/xfs_mount.h | 1 > > > fs/xfs/xfs_rtalloc.c | 98 ++++++++++++++++++++++++++++++++++++++---- > > > 4 files changed, 105 insertions(+), 13 deletions(-) > > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > > > index 126a0d253654a..88c62e1158ac7 100644 > > > --- a/fs/xfs/libxfs/xfs_bmap.c > > > +++ b/fs/xfs/libxfs/xfs_bmap.c > > > @@ -3151,8 +3151,17 @@ xfs_bmap_adjacent_valid( > > > struct xfs_mount *mp = ap->ip->i_mount; > > > > > > if (XFS_IS_REALTIME_INODE(ap->ip) && > > > - (ap->datatype & XFS_ALLOC_USERDATA)) > > > - return x < mp->m_sb.sb_rblocks; > > > + (ap->datatype & XFS_ALLOC_USERDATA)) { > > > + if (x >= mp->m_sb.sb_rblocks) > > > + return false; > > > + if (!xfs_has_rtgroups(mp)) > > > + return true; > > > + > > > + return xfs_rtb_to_rgno(mp, x) == xfs_rtb_to_rgno(mp, y) && > > > + xfs_rtb_to_rgno(mp, x) < mp->m_sb.sb_rgcount && > > > + xfs_rtb_to_rtx(mp, x) < mp->m_sb.sb_rgextents; > > > > WHy do we need the xfs_has_rtgroups() check here? The new rtg logic will > > return true for an old school rt device here, right? > > The incore sb_rgextents is zero on !rtg filesystems, so we need the > xfs_has_rtgroups. Hmmm. Could we initialise it in memory only for !rtg filesystems, and make sure we never write it back via a check in the xfs_sb_to_disk() formatter function? That would remove one of the problematic in-memory differences between old skool rtdev setups and the new rtg-based setups... > > > @@ -1835,9 +1908,16 @@ xfs_bmap_rtalloc( > > > if (xfs_bmap_adjacent(ap)) > > > bno_hint = ap->blkno; > > > > > > - error = xfs_rtallocate(ap->tp, bno_hint, raminlen, ralen, prod, > > > - ap->wasdel, initial_user_data, &rtlocked, > > > - &ap->blkno, &ap->length); > > > + if (xfs_has_rtgroups(ap->ip->i_mount)) { > > > + error = xfs_rtallocate_rtgs(ap->tp, bno_hint, raminlen, ralen, > > > + prod, ap->wasdel, initial_user_data, > > > + &ap->blkno, &ap->length); > > > + } else { > > > + error = xfs_rtallocate_rtg(ap->tp, 0, bno_hint, raminlen, ralen, > > > + prod, ap->wasdel, initial_user_data, > > > + &rtlocked, &ap->blkno, &ap->length); > > > + } > > > > The xfs_has_rtgroups() check is unnecessary. The iterator in > > xfs_rtallocate_rtgs() will do the right thing for the > > !xfs_has_rtgroups() case - it'll set start_rgno = 0 and break out > > after a single call to xfs_rtallocate_rtg() with rgno = 0. > > > > Another thing that probably should be done here is push all the > > constant value calculations a couple of functions down the stack to > > where they are used. Then we only need to pass two parameters down > > through the rg iterator here, not 11... > > ..and pass the ap itself too, to remove three of the parameters? Yeah, I was thinking that the iterator only needs the bno_hint to determine which group to start iterating. Everything else is derived from information in the ap structure and so doesn't need to be calculated above the iterator. Though we could just lift the xfs_rtalloc_args() up to this level and stuff all the parameters into that structure and pass it down instead (like we do with xfs_alloc_args for the btree allocator). Then we only need to pass args through xfs_rtallocate(), xfs_rtallocate_extent_near/size() and all the other helper functions, too. That's a much bigger sort of cleanup, though, but I think it would be worth doing a some point because it would bring the rtalloc code closer to how the btalloc code is structured. And, perhaps, allow us to potentially share group selection and iteration code between the bt and rt allocators in future... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx