Re: [PATCH 2/2] repair: AGFL rebuild fails if btree split required

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 29, 2014 at 02:09:04PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> In phase 5 we rebuild the freelist btrees, the AGF and the AGFL from
> all the free space information we have gathered and resolved during
> pahses 3 and 4. If the freespace information is laid out just right,
> we end up having to allocate free space for the AGFL blocks.
> 
> If the size of the free space we allocate from is larger than the
> space we need, then we have to insert the remainder back into the
> freespace btree. For the by-size tree, this means we are likely to
> be removing a record from one leaf, and then inserting the remainder
> - a smaller size - into another leaf.
> 
> The issue is that the leaf blocks to the left of the original leaf
> block we removed the extent record from are full and hence require a
> split to insert the new record. That, of course, requires a free
> block in the AGFL to allocate from, and now we have a chicken and
> egg situation: there are no free blocks in the AGFL because we are
> setting it up.
> 
> As a result, setting up the free list silently fails, leaving the
> freespace btrees in an inconsistent state and the AGFL in question
> empty. When the filesystem is next mounted, the first allocation
> from that AGF results in attempting to fix the AGFL, and it then
> does exactly the same thing as the repair code, fails to allocate a
> block during the split and fails. This results in an immediate
> shutdown because the transaction doing the allocation is dirty by
> this stage.
> 
> The fix for the problem is to make repair handle rebulding the btree
> differently. If we leave ispace for a couple of records in each
> btree leaf and node, there is never a need for a split to occur when
> initially setting up the AGFL. This results in repair doing the
> right thing, and hence the runtime problems after mount don't occur.
> Further, add error checking the the AGFL setup code and abort repair
> if we have a failure to correctly set up the AGFL so we catch this
> problem at repair time, not mount time...
> 

Interesting problem, thanks for the breakdown. I suppose the only
interesting side effect is that the alloc btrees might require a bit
more space after repair than before. I wonder if we need to take that
into consideration here for certain cases. E.g., does this have
ramifications for running repair on a clean, but completely full fs, or
should that be generally handled by the existence of reserved blocks?

A couple minor nits below...

> Reported-by: Barkley Vowk <bvowk@xxxxxxx>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  repair/phase5.c | 44 +++++++++++++++++++++++++++++++++++---------
>  1 file changed, 35 insertions(+), 9 deletions(-)
> 
> diff --git a/repair/phase5.c b/repair/phase5.c
> index d6d3c6d..3d58936 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -335,11 +335,22 @@ finish_cursor(bt_status_t *curs)
>  }
>  
>  /*
> + * We need to leave some free records in the tree for the corner case of
> + * setting up the AGFL. This may require allocation of blocks, and as
> + * such can require insertion of new records into the tree (e.g. moving
> + * a record in the by-count tree when a long extent is shortened). If we
> + * pack the records into the leaves with no slack space, this requires a
> + * leaf split to occur and a block to be allocated from the free list.
> + * If we don't have any blocks on the free list (because we are setting
> + * it up!), then we fail, and the filesystem will fail with the same
> + * failure at runtime. Hence leave a couple of records slack space in
> + * each block to allow immediate modification of the tree without
> + * requiring splits to be done.
> + *
>   * XXX(hch): any reason we don't just look at mp->m_alloc_mxr?

I guess we could kill this comment now. ;)

>   */
>  #define XR_ALLOC_BLOCK_MAXRECS(mp, level) \
> -			xfs_allocbt_maxrecs((mp), (mp)->m_sb.sb_blocksize, \
> -						(level) == 0)
> +	(xfs_allocbt_maxrecs((mp), (mp)->m_sb.sb_blocksize, (level) == 0) - 2)
>  
>  /*
>   * this calculates a freespace cursor for an ag.
> @@ -361,10 +372,6 @@ calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
>  	bt_stat_level_t		*p_lptr;
>  	extent_tree_node_t	*ext_ptr;
>  	int			level;
> -#ifdef XR_BLD_FREE_TRACE
> -	fprintf(stderr,
> -		"in init_freespace_cursor, agno = %d\n", agno);
> -#endif
>  
>  	num_extents = *extents;
>  	extents_used = 0;
> @@ -385,6 +392,13 @@ calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
>  	lptr->num_recs_tot = num_extents;
>  	level = 1;
>  
> +#ifdef XR_BLD_FREE_TRACE
> +	fprintf(stderr, "%s 0 %d %d %d %d\n", __func__,

What's the 0 for?

Brian

> +			lptr->num_blocks,
> +			lptr->num_recs_pb,
> +			lptr->modulo,
> +			lptr->num_recs_tot);
> +#endif
>  	/*
>  	 * if we need more levels, set them up.  # of records
>  	 * per level is the # of blocks in the level below it
> @@ -402,6 +416,14 @@ calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
>  			lptr->num_recs_pb = p_lptr->num_blocks
>  					/ lptr->num_blocks;
>  			lptr->num_recs_tot = p_lptr->num_blocks;
> +#ifdef XR_BLD_FREE_TRACE
> +			fprintf(stderr, "%s %d %d %d %d %d\n", __func__,
> +					level,
> +					lptr->num_blocks,
> +					lptr->num_recs_pb,
> +					lptr->modulo,
> +					lptr->num_recs_tot);
> +#endif
>  		}
>  	}
>  
> @@ -546,8 +568,7 @@ calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
>  				lptr = &btree_curs->level[level];
>  				p_lptr = &btree_curs->level[level-1];
>  				lptr->num_blocks = howmany(p_lptr->num_blocks,
> -						XR_ALLOC_BLOCK_MAXRECS(mp,
> -								level));
> +					XR_ALLOC_BLOCK_MAXRECS(mp, level));
>  				lptr->modulo = p_lptr->num_blocks
>  						% lptr->num_blocks;
>  				lptr->num_recs_pb = p_lptr->num_blocks
> @@ -1434,6 +1455,7 @@ build_agf_agfl(xfs_mount_t	*mp,
>  		xfs_alloc_arg_t	args;
>  		xfs_trans_t	*tp;
>  		struct xfs_trans_res tres = {0};
> +		int		error;
>  
>  		memset(&args, 0, sizeof(args));
>  		args.tp = tp = libxfs_trans_alloc(mp, 0);
> @@ -1442,8 +1464,12 @@ build_agf_agfl(xfs_mount_t	*mp,
>  		args.alignment = 1;
>  		args.pag = xfs_perag_get(mp,agno);
>  		libxfs_trans_reserve(tp, &tres, XFS_MIN_FREELIST(agf, mp), 0);
> -		libxfs_alloc_fix_freelist(&args, 0);
> +		error = libxfs_alloc_fix_freelist(&args, 0);
>  		xfs_perag_put(args.pag);
> +		if (error) {
> +			do_error(_("failed to fix AGFL on AG %d, error %d\n"),
> +					agno, error);
> +		}
>  		libxfs_trans_commit(tp, 0);
>  	}
>  
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs




[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux