Re: [PATCH 12/63] xfs: adjust refcount of an extent of blocks in refcount btree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 27, 2016 at 07:54:46PM -0700, Darrick J. Wong wrote:
> Provide functions to adjust the reference counts for an extent of
> physical blocks stored in the refcount btree.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> ---
> v2: Refactor the left/right split code into a single function.  Track
> the number of btree shape changes and record updates during a refcount
> update so that we can decide if we need to get a fresh transaction to
> continue.
> ---
>  fs/xfs/libxfs/xfs_refcount.c |  783 ++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_error.h           |    4 
>  2 files changed, 786 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> index de13406..4f82651 100644
> --- a/fs/xfs/libxfs/xfs_refcount.c
> +++ b/fs/xfs/libxfs/xfs_refcount.c
> @@ -37,6 +37,12 @@
>  #include "xfs_bit.h"
>  #include "xfs_refcount.h"
>  
> +/* Allowable refcount adjustment amounts. */
> +enum xfs_refc_adjust_op {
> +	XFS_REFCOUNT_ADJUST_INCREASE	= 1,
> +	XFS_REFCOUNT_ADJUST_DECREASE	= -1,
> +};
> +
>  /*
>   * Look up the first record less than or equal to [bno, len] in the btree
>   * given by cur.
> @@ -175,3 +181,780 @@ xfs_refcount_delete(
>  				cur->bc_private.a.agno, error, _RET_IP_);
>  	return error;
>  }
> +
> +/*
> + * Adjusting the Reference Count
> + *
> + * As stated elsewhere, the reference count btree (refcbt) stores
> + * >1 reference counts for extents of physical blocks.  In this
> + * operation, we're either raising or lowering the reference count of
> + * some subrange stored in the tree:
> + *
> + *      <------ adjustment range ------>
> + * ----+   +---+-----+ +--+--------+---------
> + *  2  |   | 3 |  4  | |17|   55   |   10
> + * ----+   +---+-----+ +--+--------+---------
> + * X axis is physical blocks number;
> + * reference counts are the numbers inside the rectangles
> + *
> + * The first thing we need to do is to ensure that there are no
> + * refcount extents crossing either boundary of the range to be
> + * adjusted.  For any extent that does cross a boundary, split it into
> + * two extents so that we can increment the refcount of one of the
> + * pieces later:
> + *
> + *      <------ adjustment range ------>
> + * ----+   +---+-----+ +--+--------+----+----
> + *  2  |   | 3 |  2  | |17|   55   | 10 | 10
> + * ----+   +---+-----+ +--+--------+----+----
> + *
> + * For this next step, let's assume that all the physical blocks in
> + * the adjustment range are mapped to a file and are therefore in use
> + * at least once.  Therefore, we can infer that any gap in the
> + * refcount tree within the adjustment range represents a physical
> + * extent with refcount == 1:
> + *
> + *      <------ adjustment range ------>
> + * ----+---+---+-----+-+--+--------+----+----
> + *  2  |"1"| 3 |  2  |1|17|   55   | 10 | 10
> + * ----+---+---+-----+-+--+--------+----+----
> + *      ^
> + *
> + * For each extent that falls within the interval range, figure out
> + * which extent is to the left or the right of that extent.  Now we
> + * have a left, current, and right extent.  If the new reference count
> + * of the center extent enables us to merge left, center, and right
> + * into one record covering all three, do so.  If the center extent is
> + * at the left end of the range, abuts the left extent, and its new
> + * reference count matches the left extent's record, then merge them.
> + * If the center extent is at the right end of the range, abuts the
> + * right extent, and the reference counts match, merge those.  In the
> + * example, we can left merge (assuming an increment operation):
> + *
> + *      <------ adjustment range ------>
> + * --------+---+-----+-+--+--------+----+----
> + *    2    | 3 |  2  |1|17|   55   | 10 | 10
> + * --------+---+-----+-+--+--------+----+----
> + *          ^
> + *
> + * For all other extents within the range, adjust the reference count
> + * or delete it if the refcount falls below 2.  If we were
> + * incrementing, the end result looks like this:
> + *
> + *      <------ adjustment range ------>
> + * --------+---+-----+-+--+--------+----+----
> + *    2    | 4 |  3  |2|18|   56   | 11 | 10
> + * --------+---+-----+-+--+--------+----+----
> + *
> + * The result of a decrement operation looks as such:
> + *
> + *      <------ adjustment range ------>
> + * ----+   +---+       +--+--------+----+----
> + *  2  |   | 2 |       |16|   54   |  9 | 10
> + * ----+   +---+       +--+--------+----+----
> + *      DDDD    111111DD
> + *
> + * The blocks marked "D" are freed; the blocks marked "1" are only
> + * referenced once and therefore the record is removed from the
> + * refcount btree.
> + */
> +

Thanks for the big comment. It makes this much easier to follow.

> +#define RCNEXT(rc)	((rc).rc_startblock + (rc).rc_blockcount)
...
> +/*
> + * Merge the left, center, and right extents.
> + */
> +STATIC int
> +xfs_refcount_merge_center_extent(
> +	struct xfs_btree_cur		*cur,
> +	struct xfs_refcount_irec	*left,
> +	struct xfs_refcount_irec	*center,
> +	unsigned long long		extlen,
> +	xfs_agblock_t			*agbno,
> +	xfs_extlen_t			*aglen)
> +{
> +	int				error;
> +	int				found_rec;
> +
> +	error = xfs_refcount_lookup_ge(cur, center->rc_startblock,
> +			&found_rec);
> +	if (error)
> +		goto out_error;
> +	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> +
> +	error = xfs_refcount_delete(cur, &found_rec);
> +	if (error)
> +		goto out_error;
> +	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> +
> +	if (center->rc_refcount > 1) {

Comment please :) (what is special about refcount == 1?).

> +		error = xfs_refcount_delete(cur, &found_rec);
> +		if (error)
> +			goto out_error;
> +		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> +				out_error);
> +	}
> +
> +	error = xfs_refcount_lookup_le(cur, left->rc_startblock,
> +			&found_rec);
> +	if (error)
> +		goto out_error;
> +	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> +
> +	left->rc_blockcount = extlen;

Is extlen valid if the (refcount > 1) condition above fails? Digging
further, I suppose so as the refcount == 1 case appears to be a
filler/non-existent extent..?

> +	error = xfs_refcount_update(cur, left);
> +	if (error)
> +		goto out_error;
> +
> +	*aglen = 0;
> +	return error;
> +
> +out_error:
> +	trace_xfs_refcount_merge_center_extents_error(cur->bc_mp,
> +			cur->bc_private.a.agno, error, _RET_IP_);
> +	return error;
> +}
> +
...
> +/*
> + * Try to merge with any extents on the boundaries of the adjustment range.
> + */
> +STATIC int
> +xfs_refcount_merge_extents(
> +	struct xfs_btree_cur	*cur,
> +	xfs_agblock_t		*agbno,
> +	xfs_extlen_t		*aglen,
> +	enum xfs_refc_adjust_op adjust,
> +	bool			*shape_changed)
> +{
> +	struct xfs_refcount_irec	left = {0}, cleft = {0};
> +	struct xfs_refcount_irec	cright = {0}, right = {0};
> +	int				error;
> +	unsigned long long		ulen;
> +	bool				cequal;
> +
> +	*shape_changed = false;
> +	/*
> +	 * Find the extent just below agbno [left], just above agbno [cleft],
> +	 * just below (agbno + aglen) [cright], and just above (agbno + aglen)
> +	 * [right].
> +	 */
> +	error = xfs_refcount_find_left_extents(cur, &left, &cleft, *agbno,
> +			*aglen);
> +	if (error)
> +		return error;
> +	error = xfs_refcount_find_right_extents(cur, &right, &cright, *agbno,
> +			*aglen);
> +	if (error)
> +		return error;
> +
> +	/* No left or right extent to merge; exit. */
> +	if (left.rc_blockcount == 0 && right.rc_blockcount == 0)
> +		return 0;

Can we use start block and NULLAGBLOCK in the find helpers to identify
the no extent case rather than blockcount == 0?

> +
> +	*shape_changed = true;

Why is this true already if we haven't actually merged extents yet? This
could use some explanation...

> +	cequal = (cleft.rc_startblock == cright.rc_startblock) &&
> +		 (cleft.rc_blockcount == cright.rc_blockcount);
> +
> +	/* Try to merge left, cleft, and right.  cleft must == cright. */
> +	ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount +
> +			right.rc_blockcount;
> +	if (left.rc_blockcount != 0 && right.rc_blockcount != 0 &&
> +	    cleft.rc_blockcount != 0 && cright.rc_blockcount != 0 &&
> +	    cequal &&
> +	    left.rc_refcount == cleft.rc_refcount + adjust &&
> +	    right.rc_refcount == cleft.rc_refcount + adjust &&
> +	    ulen < MAXREFCEXTLEN) {
> +		trace_xfs_refcount_merge_center_extents(cur->bc_mp,
> +				cur->bc_private.a.agno, &left, &cleft, &right);

I'd suggest to bury tracepoints such as this (generally speaking, those
that match a function name) down into the start of the associated
function rather than at the call site. That seems more consistent with
the rest of XFS.

> +		return xfs_refcount_merge_center_extent(cur, &left, &cleft,
> +				ulen, agbno, aglen);
> +	}
> +
> +	/* Try to merge left and cleft. */
> +	ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount;
> +	if (left.rc_blockcount != 0 && cleft.rc_blockcount != 0 &&
> +	    left.rc_refcount == cleft.rc_refcount + adjust &&
> +	    ulen < MAXREFCEXTLEN) {
> +		trace_xfs_refcount_merge_left_extent(cur->bc_mp,
> +				cur->bc_private.a.agno, &left, &cleft);
> +		error = xfs_refcount_merge_left_extent(cur, &left, &cleft,
> +				agbno, aglen);
> +		if (error)
> +			return error;
> +
> +		/*
> +		 * If we just merged left + cleft and cleft == cright,
> +		 * we no longer have a cright to merge with right.  We're done.
> +		 */
> +		if (cequal)
> +			return 0;
> +	}
> +
> +	/* Try to merge cright and right. */
> +	ulen = (unsigned long long)right.rc_blockcount + cright.rc_blockcount;
> +	if (right.rc_blockcount != 0 && cright.rc_blockcount != 0 &&
> +	    right.rc_refcount == cright.rc_refcount + adjust &&
> +	    ulen < MAXREFCEXTLEN) {
> +		trace_xfs_refcount_merge_right_extent(cur->bc_mp,
> +				cur->bc_private.a.agno, &cright, &right);
> +		return xfs_refcount_merge_right_extent(cur, &right, &cright,
> +				agbno, aglen);
> +	}
> +
> +	return error;
> +}
> +
...
> +/*
> + * Adjust the refcounts of middle extents.  At this point we should have
> + * split extents that crossed the adjustment range; merged with adjacent
> + * extents; and updated agbno/aglen to reflect the merges.  Therefore,
> + * all we have to do is update the extents inside [agbno, agbno + aglen].
> + */
> +STATIC int
> +xfs_refcount_adjust_extents(
> +	struct xfs_btree_cur	*cur,
> +	xfs_agblock_t		agbno,
> +	xfs_extlen_t		aglen,
> +	xfs_extlen_t		*adjusted,
> +	enum xfs_refc_adjust_op	adj,
> +	struct xfs_defer_ops	*dfops,
> +	struct xfs_owner_info	*oinfo)
> +{
> +	struct xfs_refcount_irec	ext, tmp;
> +	int				error;
> +	int				found_rec, found_tmp;
> +	xfs_fsblock_t			fsbno;
> +
> +	/* Merging did all the work already. */
> +	if (aglen == 0)
> +		return 0;
> +
> +	error = xfs_refcount_lookup_ge(cur, agbno, &found_rec);
> +	if (error)
> +		goto out_error;
> +
> +	while (aglen > 0 && xfs_refcount_still_have_space(cur)) {
> +		error = xfs_refcount_get_rec(cur, &ext, &found_rec);
> +		if (error)
> +			goto out_error;
> +		if (!found_rec) {
> +			ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
> +			ext.rc_blockcount = 0;
> +			ext.rc_refcount = 0;
> +		}
> +
> +		/*
> +		 * Deal with a hole in the refcount tree; if a file maps to
> +		 * these blocks and there's no refcountbt recourd, pretend that

Typo:							  record

> +		 * there is one with refcount == 1.
> +		 */

Also, what actually confirms that the blocks are file mapped? Couldn't
blocks absent from the refcbt also reside in the allocbt? (I guess the
broader question is what are the rules/expectations for handling sparse
files..?).

> +		if (ext.rc_startblock != agbno) {
> +			tmp.rc_startblock = agbno;
> +			tmp.rc_blockcount = min(aglen,
> +					ext.rc_startblock - agbno);
> +			tmp.rc_refcount = 1 + adj;
> +			trace_xfs_refcount_modify_extent(cur->bc_mp,
> +					cur->bc_private.a.agno, &tmp);
> +
> +			/*
> +			 * Either cover the hole (increment) or
> +			 * delete the range (decrement).
> +			 */
> +			if (tmp.rc_refcount) {
> +				error = xfs_refcount_insert(cur, &tmp,
> +						&found_tmp);
> +				if (error)
> +					goto out_error;
> +				XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
> +						found_tmp == 1, out_error);
> +				cur->bc_private.a.priv.refc.nr_ops++;
> +			} else {
> +				fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
> +						cur->bc_private.a.agno,
> +						tmp.rc_startblock);
> +				xfs_bmap_add_free(cur->bc_mp, dfops, fsbno,
> +						tmp.rc_blockcount, oinfo);
> +			}
> +
> +			(*adjusted) += tmp.rc_blockcount;
> +			agbno += tmp.rc_blockcount;
> +			aglen -= tmp.rc_blockcount;
> +
> +			error = xfs_refcount_lookup_ge(cur, agbno,
> +					&found_rec);
> +			if (error)
> +				goto out_error;
> +		}
> +
> +		/* Stop if there's nothing left to modify */
> +		if (aglen == 0 || !xfs_refcount_still_have_space(cur))
> +			break;
> +
> +		/*
> +		 * Adjust the reference count and either update the tree
> +		 * (incr) or free the blocks (decr).
> +		 */
> +		if (ext.rc_refcount == MAXREFCOUNT)
> +			goto skip;
> +		ext.rc_refcount += adj;
> +		trace_xfs_refcount_modify_extent(cur->bc_mp,
> +				cur->bc_private.a.agno, &ext);
> +		if (ext.rc_refcount > 1) {
> +			error = xfs_refcount_update(cur, &ext);
> +			if (error)
> +				goto out_error;
> +			cur->bc_private.a.priv.refc.nr_ops++;
> +		} else if (ext.rc_refcount == 1) {
> +			error = xfs_refcount_delete(cur, &found_rec);
> +			if (error)
> +				goto out_error;
> +			XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
> +					found_rec == 1, out_error);
> +			cur->bc_private.a.priv.refc.nr_ops++;
> +			goto advloop;
> +		} else {
> +			fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
> +					cur->bc_private.a.agno,
> +					ext.rc_startblock);
> +			xfs_bmap_add_free(cur->bc_mp, dfops, fsbno,
> +					ext.rc_blockcount, oinfo);
> +		}
> +
> +skip:
> +		error = xfs_btree_increment(cur, 0, &found_rec);
> +		if (error)
> +			goto out_error;
> +
> +advloop:
> +		(*adjusted) += ext.rc_blockcount;
> +		agbno += ext.rc_blockcount;
> +		aglen -= ext.rc_blockcount;
> +	}
> +
> +	return error;
> +out_error:
> +	trace_xfs_refcount_modify_extent_error(cur->bc_mp,
> +			cur->bc_private.a.agno, error, _RET_IP_);
> +	return error;
> +}
> +
> +/* Adjust the reference count of a range of AG blocks. */
> +STATIC int
> +xfs_refcount_adjust(
> +	struct xfs_btree_cur	*cur,
> +	xfs_agblock_t		agbno,
> +	xfs_extlen_t		aglen,
> +	xfs_extlen_t		*adjusted,
> +	enum xfs_refc_adjust_op	adj,
> +	struct xfs_defer_ops	*dfops,
> +	struct xfs_owner_info	*oinfo)
> +{
> +	xfs_extlen_t		orig_aglen;
> +	bool			shape_changed;
> +	int			shape_changes = 0;
> +	int			error;
> +
> +	*adjusted = 0;
> +	switch (adj) {
> +	case XFS_REFCOUNT_ADJUST_INCREASE:
> +		trace_xfs_refcount_increase(cur->bc_mp, cur->bc_private.a.agno,
> +				agbno, aglen);
> +		break;
> +	case XFS_REFCOUNT_ADJUST_DECREASE:
> +		trace_xfs_refcount_decrease(cur->bc_mp, cur->bc_private.a.agno,
> +				agbno, aglen);
> +		break;
> +	default:
> +		ASSERT(0);
> +	}
> +

Seems like slight overkill... how about a single
trace_xfs_refcount_adjust() call that also takes adj as a parameter? We
should be able to even print an "inc" or "dec" string or some such if we
want for easier filtering.

Brian

> +	/*
> +	 * Ensure that no rcextents cross the boundary of the adjustment range.
> +	 */
> +	error = xfs_refcount_split_extent(cur, agbno, &shape_changed);
> +	if (error)
> +		goto out_error;
> +	if (shape_changed)
> +		shape_changes++;
> +
> +	error = xfs_refcount_split_extent(cur, agbno + aglen, &shape_changed);
> +	if (error)
> +		goto out_error;
> +	if (shape_changed)
> +		shape_changes++;
> +
> +	/*
> +	 * Try to merge with the left or right extents of the range.
> +	 */
> +	orig_aglen = aglen;
> +	error = xfs_refcount_merge_extents(cur, &agbno, &aglen, adj,
> +			&shape_changed);
> +	if (error)
> +		goto out_error;
> +	if (shape_changed)
> +		shape_changes++;
> +	(*adjusted) += orig_aglen - aglen;
> +	if (shape_changes)
> +		cur->bc_private.a.priv.refc.shape_changes++;
> +
> +	/* Now that we've taken care of the ends, adjust the middle extents */
> +	error = xfs_refcount_adjust_extents(cur, agbno, aglen, adjusted, adj,
> +			dfops, oinfo);
> +	if (error)
> +		goto out_error;
> +
> +	return 0;
> +
> +out_error:
> +	trace_xfs_refcount_adjust_error(cur->bc_mp, cur->bc_private.a.agno,
> +			error, _RET_IP_);
> +	return error;
> +}
> diff --git a/fs/xfs/xfs_error.h b/fs/xfs/xfs_error.h
> index 3d22470..d9675c64 100644
> --- a/fs/xfs/xfs_error.h
> +++ b/fs/xfs/xfs_error.h
> @@ -92,7 +92,8 @@ extern void xfs_verifier_error(struct xfs_buf *bp);
>  #define XFS_ERRTAG_BMAPIFORMAT				21
>  #define XFS_ERRTAG_FREE_EXTENT				22
>  #define XFS_ERRTAG_RMAP_FINISH_ONE			23
> -#define XFS_ERRTAG_MAX					24
> +#define XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE		24
> +#define XFS_ERRTAG_MAX					25
>  
>  /*
>   * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
> @@ -121,6 +122,7 @@ extern void xfs_verifier_error(struct xfs_buf *bp);
>  #define	XFS_RANDOM_BMAPIFORMAT				XFS_RANDOM_DEFAULT
>  #define XFS_RANDOM_FREE_EXTENT				1
>  #define XFS_RANDOM_RMAP_FINISH_ONE			1
> +#define XFS_RANDOM_REFCOUNT_CONTINUE_UPDATE		1
>  
>  #ifdef DEBUG
>  extern int xfs_error_test_active;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux