Re: [PATCH] [RFC] xfs: prevent bmap btree from using alloc btree reserved blocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ping?

On Tue, Aug 24, 2010 at 11:45:33PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> Recently we've had WANT_CORRUPTED_GOTO filesystem shutdowns reported
> on filesystems with large numbers of small AGs. RedHat QA found a
> simple test case at:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=626244
> 
> Which was further refined to:
> 
> # mkfs.xfs -f -d agsize=16m,size=50g <dev>
> # mount <dev> /mnt
> # xfs_io -f -c 'resvsp 0 40G' /mnt/foo
> 
> The initial analysis is in the above bug. The fundamental problem is
> that the data extent allocation is using all the free blocks in the
> allocation group, and then the bmap btree block allocation is
> dipping into the reserved block pool that each AG holds for
> freespace btree manipulations. This results in failures when
> multiple bmap btree blocks are needed for a multilevel split of the
> bmap btree.
> 
> The reason this is occurring is that when we do a btree block
> allocation after a data extent allocation we run down the path that
> does not set up the minleft allocation parameter. That is, we allow
> the btree block allocation to use up all the blocks in the current
> AG if that is what will make the current allocation succeed. This
> what we are supposed to only allow the low space allocation
> algorithm to do, not a normal allocation. The result is that we can
> allocate the first block from the AG freelist, and then the second
> block allocation in the split will fail in the same AG because we do
> not have a minleft parameter set and hence will not pass the checks
> in xfs_alloc_fix_freelist() and hence the allocation will fail.
> Further, because no minleft parameter is set, the btree block
> allocation will not retry the allocation with different parameters
> and (potentially) enable the low space algorithm.
> 
> The result is that we fail a btree block allocation and hence fail
> the extent allocation with a dirty btree and transaction. The dirty
> btree causes the WANT_CORRUPTED_GOTO warning, and cancelling the
> dirty transaction triggers the shutdown.
> 
> The fix appears to be to ensure that we set the minleft parameter
> correctly to reflect the potential number of btree blocks we still
> need to allocate from the same AG if we are doing a worst case
> split. By doing so, the particular case results in the first btree
> block allocation setting minleft to the number of blocks needed for
> a split. Hence the AG that we just allocated all the free blocks out
> of for the data extent will fail because there are not enough free
> blocks for all the blocks in the split in the AG. It will fail this
> without dirtying anything, and because minleft is now > 0, will
> trigger the retry algorithm.
> 
> The fallback algorithm also needs a slight modification. It
> currently assumes that if minleft is set, no allocation has been
> done yet, so it can scan from AG 0 to find a free block. If it is
> left like this, it can trigger deadlocks from the new case as we
> have a prior allocation and hence cannot allocate from an AG less
> than the current AG. Hence it is modified to use a START_AG
> allocation to scan upwards from the current AG, hence avoiding the
> known AG locking deadlocks.
> 
> As far as I can tell, the bmap btree code has never handled this
> case properly - I checked as far back as 1995 and minleft has never
> been set to avoid selecting an AG that cannot be allocated out of...
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_bmap_btree.c |   51 ++++++++++++++++++++++++++++++++++++++--------
>  1 files changed, 42 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_btree.c b/fs/xfs/xfs_bmap_btree.c
> index 87d3c10..d5ef4e3 100644
> --- a/fs/xfs/xfs_bmap_btree.c
> +++ b/fs/xfs/xfs_bmap_btree.c
> @@ -538,6 +538,25 @@ xfs_bmbt_alloc_block(
>  		args.type = XFS_ALLOCTYPE_START_BNO;
>  	} else {
>  		args.type = XFS_ALLOCTYPE_NEAR_BNO;
> +
> +		/*
> +		 * we've come in here because this is the second or subsequent
> +		 * btree block we need to allocate for the bmap btree
> +		 * modification. If we've just emptied the AG and there are
> +		 * only free list blocks left, we need to make sure that we
> +		 * take into account the minleft value that was reserved on the
> +		 * first allocation through here (the NULLFSBLOCK branch
> +		 * above). In that case, minleft will already take into account
> +		 * the maximum number of blocks needed for a btree split, and
> +		 * the number of blocks already allocated is recorded in the
> +		 * cursor. From that, we can work out exactly how much smaller
> +		 * the minleft should be so that we don't select an AG that
> +		 * does not have enough blocks available to continue the entire
> +		 * btree split.
> +		 */
> +		args.minleft = XFS_BM_MAXLEVELS(args.mp,
> +					(int)cur->bc_private.b.whichfork) - 1 -
> +				cur->bc_private.b.allocated;
>  	}
>  
>  	args.minlen = args.maxlen = args.prod = 1;
> @@ -550,15 +569,29 @@ xfs_bmbt_alloc_block(
>  	if (error)
>  		goto error0;
>  
> -	if (args.fsbno == NULLFSBLOCK && args.minleft) {
> -		/*
> -		 * Could not find an AG with enough free space to satisfy
> -		 * a full btree split.  Try again without minleft and if
> -		 * successful activate the lowspace algorithm.
> -		 */
> -		args.fsbno = 0;
> -		args.type = XFS_ALLOCTYPE_FIRST_AG;
> -		args.minleft = 0;
> +	while (args.fsbno == NULLFSBLOCK && args.minleft) {
> +		if (cur->bc_private.b.firstblock == NULLFSBLOCK) {
> +			/*
> +			 * Could not find an AG with enough free space to satisfy
> +			 * a full btree split.  Try again without minleft and if
> +			 * successful activate the lowspace algorithm.
> +			 */
> +			args.type = XFS_ALLOCTYPE_FIRST_AG;
> +			args.fsbno = 0;
> +			args.minleft = 0;
> +		} else {
> +			/*
> +			 * Failed to find enough space for a btree block after
> +			 * a extent allocation has already occurred. Continue
> +			 * searching other AGs that can hold the remaining
> +			 * blocks. If we fail with minleft set, then clear it
> +			 * and try again.
> +			 */
> +			args.type = XFS_ALLOCTYPE_START_AG;
> +			args.fsbno = cur->bc_private.b.firstblock;
> +			if (cur->bc_private.b.flist->xbf_low)
> +				args.minleft = 0;
> +		}
>  		error = xfs_alloc_vextent(&args);
>  		if (error)
>  			goto error0;
> -- 
> 1.7.1
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
> 

-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs


[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux