On Thu, Mar 16, 2023 at 09:47:21AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > Callers of xfs_alloc_vextent_iterate_ags that pass in the TRYLOCK flag > want us to perform a non-blocking scan of the AGs for free space. There > are no ordering constraints for non-blocking AGF lock acquisition, so > the scan can freely start over at AG 0 even when minimum_agno > 0. > > This manifests fairly reliably on xfs/294 on 6.3-rc2 with the parent > pointer patchset applied and the realtime volume enabled. I observed > the following sequence as part of an xfs_dir_createname call: > > 0. Fragment the free space, then allocate nearly all the free space in > all AGs except AG 0. > > 1. Create a directory in AG 2 and let it grow for a while. > > 2. Try to allocate 2 blocks to expand the dirent part of a directory. > The space will be allocated out of AG 0, but the allocation will not > be contiguous. This (I think) activates the LOWMODE allocator. > > 3. The bmapi call decides to convert from extents to bmbt format and > tries to allocate 1 block. This allocation request calls > xfs_alloc_vextent_start_ag with the inode number, which starts the > scan at AG 2. We ignore AG 0 (with all its free space) and instead > scrape AG 2 and 3 for more space. We find one block, but this now > kicks t_highest_agno to 3. > > 4. The createname call decides it needs to split the dabtree. It tries > to allocate even more space with xfs_alloc_vextent_start_ag, but now > we're constrained to AG 3, and we don't find the space. The > createname returns ENOSPC and the filesystem shuts down. > > This change fixes the problem by making the trylock scan wrap around to > AG 0 if it doesn't like the AGs that it finds. Since the current > transaction itself holds AGF 0, the trylock of AGF 0 will succeed, and > we take space from the AG that has plenty. > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- > fs/xfs/libxfs/xfs_alloc.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index 8999e38e1bed..bd7112d430b6 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -3326,11 +3326,14 @@ xfs_alloc_vextent_iterate_ags( > uint32_t flags) > { > struct xfs_mount *mp = args->mp; > + xfs_agnumber_t restart_agno = minimum_agno; > xfs_agnumber_t agno; > int error = 0; > > + if (flags & XFS_ALLOC_FLAG_TRYLOCK) > + restart_agno = 0; > restart: > - for_each_perag_wrap_range(mp, start_agno, minimum_agno, > + for_each_perag_wrap_range(mp, start_agno, restart_agno, > mp->m_sb.sb_agcount, agno, args->pag) { > args->agno = agno; > error = xfs_alloc_vextent_prepare_ag(args); > @@ -3369,6 +3372,7 @@ xfs_alloc_vextent_iterate_ags( > */ > if (flags) { > flags = 0; > + restart_agno = minimum_agno; > goto restart; > } Looks good. Pretty much what we talked about on #xfs. Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> -- Dave Chinner david@xxxxxxxxxxxxx