Re: [PATCH] xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 16, 2023 at 09:47:21AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> Callers of xfs_alloc_vextent_iterate_ags that pass in the TRYLOCK flag
> want us to perform a non-blocking scan of the AGs for free space.  There
> are no ordering constraints for non-blocking AGF lock acquisition, so
> the scan can freely start over at AG 0 even when minimum_agno > 0.
> 
> This manifests fairly reliably on xfs/294 on 6.3-rc2 with the parent
> pointer patchset applied and the realtime volume enabled.  I observed
> the following sequence as part of an xfs_dir_createname call:
> 
> 0. Fragment the free space, then allocate nearly all the free space in
>    all AGs except AG 0.
> 
> 1. Create a directory in AG 2 and let it grow for a while.
> 
> 2. Try to allocate 2 blocks to expand the dirent part of a directory.
>    The space will be allocated out of AG 0, but the allocation will not
>    be contiguous.  This (I think) activates the LOWMODE allocator.
> 
> 3. The bmapi call decides to convert from extents to bmbt format and
>    tries to allocate 1 block.  This allocation request calls
>    xfs_alloc_vextent_start_ag with the inode number, which starts the
>    scan at AG 2.  We ignore AG 0 (with all its free space) and instead
>    scrape AG 2 and 3 for more space.  We find one block, but this now
>    kicks t_highest_agno to 3.
> 
> 4. The createname call decides it needs to split the dabtree.  It tries
>    to allocate even more space with xfs_alloc_vextent_start_ag, but now
>    we're constrained to AG 3, and we don't find the space.  The
>    createname returns ENOSPC and the filesystem shuts down.
> 
> This change fixes the problem by making the trylock scan wrap around to
> AG 0 if it doesn't like the AGs that it finds.  Since the current
> transaction itself holds AGF 0, the trylock of AGF 0 will succeed, and
> we take space from the AG that has plenty.
> 
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> ---
>  fs/xfs/libxfs/xfs_alloc.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 8999e38e1bed..bd7112d430b6 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3326,11 +3326,14 @@ xfs_alloc_vextent_iterate_ags(
>  	uint32_t		flags)
>  {
>  	struct xfs_mount	*mp = args->mp;
> +	xfs_agnumber_t		restart_agno = minimum_agno;
>  	xfs_agnumber_t		agno;
>  	int			error = 0;
>  
> +	if (flags & XFS_ALLOC_FLAG_TRYLOCK)
> +		restart_agno = 0;
>  restart:
> -	for_each_perag_wrap_range(mp, start_agno, minimum_agno,
> +	for_each_perag_wrap_range(mp, start_agno, restart_agno,
>  			mp->m_sb.sb_agcount, agno, args->pag) {
>  		args->agno = agno;
>  		error = xfs_alloc_vextent_prepare_ag(args);
> @@ -3369,6 +3372,7 @@ xfs_alloc_vextent_iterate_ags(
>  	 */
>  	if (flags) {
>  		flags = 0;
> +		restart_agno = minimum_agno;
>  		goto restart;
>  	}

Looks good. Pretty much what we talked about on #xfs.

Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>

-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux