Re: [PATCH 3/6] xfs: do not immediately reuse busy extent ranges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 21, 2011 at 04:22:30AM -0500, Christoph Hellwig wrote:
> Every time we reallocate a busy extent, we cause a synchronous log force
> to occur to ensure the freeing transaction is on disk before we continue
> and use the newly allocated extent.  This is extremely sub-optimal as we
> have to mark every transaction with blocks that get reused as synchronous.
> 
> Instead of searching the busy extent list after deciding on the extent to
> allocate, check each candidate extent during the allocation decisions as
> to whether they are in the busy list.  If they are in the busy list, we
> trim the busy range out of the extent we have found and determine if that
> trimmed range is still OK for allocation. In many cases, this check can
> be incorporated into the allocation extent alignment code which already
> does trimming of the found extent before determining if it is a valid
> candidate for allocation.
> 
> [hch: merged two earlier patches from Dave and fixed various bugs]
> 
> Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
....
>  	ASSERT(args->agbno % args->alignment == 0);
>  
>  	if (!args->wasfromfl) {
> -		xfs_alloc_update_counters(args->tp, args->pag, args->agbp,
> -					  -((long)(args->len)));
> +		error = xfs_alloc_update_counters(args->tp, args->pag,
> +						  args->agbp,
> +						  -((long)(args->len)));
>  
> -		/*
> -		 * Search the busylist for these blocks and mark the
> -		 * transaction as synchronous if blocks are found. This
> -		 * avoids the need to block due to a synchronous log
> -		 * force to ensure correct ordering as the synchronous
> -		 * transaction will guarantee that for us.
> -		 */
> -		if (xfs_alloc_busy_search(args->mp, args->agno,
> -					args->agbno, args->len))
> -			xfs_trans_set_sync(args->tp);
> +		ASSERT(!xfs_alloc_busy_search(args->mp, args->agno,
> +					      args->agbno, args->len));
>  	}
>  
>  	if (!args->isfl) {
> @@ -559,7 +554,7 @@ xfs_alloc_ag_vextent(
>  
>  	XFS_STATS_INC(xs_allocx);
>  	XFS_STATS_ADD(xs_allocb, args->len);
> -	return 0;
> +	return error;
>  }

These error handling changes could go in the previous patch.

> @@ -2612,7 +2688,7 @@ restart:
>   * will require a synchronous transaction, but it can still be
>   * used to distinguish between a partial or exact match.
>   */
> -int
> +STATIC int
>  xfs_alloc_busy_search(
>  	struct xfs_mount	*mp,
>  	xfs_agnumber_t		agno,
> @@ -2654,6 +2730,71 @@ xfs_alloc_busy_search(
>  	return match;
>  }
>  
> +/*
> + * For a given extent [fbno, flen], search the busy extent list
> + * to find a subset of the extent that is not busy.
> + */

How does this function return an indication that the extent selected
is entirrely unsuitable? e.g. the extent lies wholly within
the busy range and gets trimmed to zero length? perhaps it woul dbe
good to return an error in this case?

> +void
> +xfs_alloc_busy_search_trim(
> +	struct xfs_mount	*mp,
> +	struct xfs_perag	*pag,
> +	xfs_agblock_t		fbno,
> +	xfs_extlen_t		flen,
> +	xfs_agblock_t		*rbno,
> +	xfs_extlen_t		*rlen)
> +{
> +	struct rb_node		*rbp;
> +	xfs_agblock_t           bno = fbno;
> +	xfs_extlen_t            len = flen;
> +
> +	spin_lock(&pag->pagb_lock);
> +	rbp = pag->pagb_tree.rb_node;
> +	while (rbp) {
> +		struct xfs_busy_extent *busyp =
> +			rb_entry(rbp, struct xfs_busy_extent, rb_node);
> +		xfs_agblock_t	end = bno + len;
> +		xfs_agblock_t	bend = busyp->bno + busyp->length;
> +
> +		if (bno + len <= busyp->bno) {
> +			rbp = rbp->rb_left;
> +			continue;
> +		} else if (bno >= busyp->bno + busyp->length) {
> +			rbp = rbp->rb_right;
> +			continue;
> +		}

		if (end <= bbno)
			left;
		else if (bno > bend)
			right;

		/* overlap */

> +
> +		if (busyp->bno < bno) {
> +			/* start overlap */
> +			ASSERT(bend >= bno);
> +			ASSERT(bend <= end);
> +			len -= bno - bend;
> +			bno = bend;

		if (bbno < bno) {

		bbno           bend
		+-----------------+
Case 1:
		   +---------+
		   bno     end

		   No unbusy region in extent, return failure

Case 2:
		   +------------------------+
		   bno                    end

		   Needs to be trimmed to:
		                    +-------+
		                    bno   end
		   bno = bend;
		   len = end - bno;

> +		} else if (bend > end) {
> +			/* end overlap */
> +			ASSERT(busyp->bno >= bno);
> +			ASSERT(busyp->bno < end);
> +			len -= bend - end;

		} else if (bend > end) {

bno must be <= bbno in this case, so:

		bbno           bend
		+-----------------+

Case 3: bno == bbno:
		+-------------+
		bno         end

		No unbusy region in extent, return failure.

Case 4:
        +------------------+
        bno              end

   Needs to be trimmed to:
        +-------+
        bno   end

		end = bbno;
		len = end - bno;



> +		} else {
> +			/* middle overlap - return larger segment */
> +			ASSERT(busyp->bno >= bno);
> +			ASSERT(bend <= end);
> +			len = busyp->bno - bno;
> +			if (len >= end - bend) {
> +				/* use first segment */
> +				len = len;
> +			} else {
> +				/* use last segment */
> +				bno = bend;
> +				len = end - bend;
> +			}

(bno <= bbno && bend <= end) in this case, so:

		bbno           bend
		+-----------------+

Case 5: Exact match: (bno == bbno && bend == end)
		+-----------------+
		bno             end

		No unbusy region in extent, return failure.

Case 6: (bno == bbno && bend < end)

		+-------------------------+
		bno                     end

		Needs to be trimmed to:
		                  +-------+
		                  bno   end

		Same as Case 2.
			- case 2 can be extend to bbno <= bno.

Case 7: (bno < bbno && bend == end)

        +-------------------------+
        bno                     end

   Needs to be trimmed to:
        +-------+
        bno   end

		Same as case 4.
			- Case 4 can be extented to bend >= end

Case 8: (bno < bbno && bend < end)

        +---------------------------------+
        bno                             end

can be trimmed to:
        +-------+        OR       +-------+
        bno   end                 bno   end

	- Should always want to choose the lower bno extent:
		- next allocation in file will use "end" as
		  the target for first block
		- if busy segment has cleared, will get contiguous
		  allocation
		- if busy segment has not cleared, get allocation at
		  bend, which is forwards allocation.

		- if we choose segment at bend, and this remains the
		  best extent for the next allocation (e.g.
		  NEAR_BNO allocation) we'll next allocate at bno,
		  which will give us backwards allocation.

		- always chose the option that produces forwards
		  allocation patterns so that sequential reads and
		  writes only ever seek in one direction.

		- we already know that backwards allocation
		  direction (due to NEAR_BNO second algorithm)
		  causes significant fragmentation of directories
		  and degradataion of directory performance, so I
		  think we should avoid introducing a new allocation
		  algorithm that can lead to backwards allocation
		  around frequently reused extents.

	- only choose right hand edge if the remainin unused extent
	  length is much larger than the current allocation request.

	- otherwise return failure if we can't use lower bno due to
	  minlen restrictions so a new extent is chosen by the
	  higher level algorithm.


So, it looks to me like the "overlap found" algorithm shoul dbe
something like:

		if (bbno <= bno) {
			if (end <= bend) {
				/* case 1, 3, 5 */
				return failure;
			}
			/* case 2, 6 */
			bno = bend;
			len = end - bno;
		} else if (bend >= end) {
			ASSERT(bbno > bno);
			/* case 4, 7 */
			end = bbno;
			len = end - bno;
		} else {
			ASSERT(bbno > bno);
			ASSERT(bend < end);
			/* case 8 */
			if (bbno - bno >= args->minlen) {
				/* left candidate OK */
				left = 1;
			}
			if (end - bend >= args->maxlen * 4) {
				/* right candidate OK */
				right = 1;
			}
			if (left && right) {
				/* take right if left is not a
				 * maximal allocation */
				if (bbno - bno < args->maxlen)
					left = 0;
			}
			if (left) {
				end = bbno;
				len = end - bno;
			} else if (right) {
				bno = bend;
				len = end - bno;
			} else {
				return failure;
			}
		}

> @@ -109,19 +109,16 @@ xfs_trim_extents(
>  		 * If any blocks in the range are still busy, skip the
>  		 * discard and try again the next time.
>  		 */
> -		if (xfs_alloc_busy_search(mp, agno, fbno, flen)) {
> -			trace_xfs_discard_busy(mp, agno, fbno, flen);
> -			goto next_extent;
> -		}
> +		xfs_alloc_busy_search_trim(mp, pag, fbno, flen, &tbno, &tlen);
>  
> -		trace_xfs_discard_extent(mp, agno, fbno, flen);
> +		trace_xfs_discard_extent(mp, agno, tbno, tlen);
>  		error = -blkdev_issue_discard(bdev,
> -				XFS_AGB_TO_DADDR(mp, agno, fbno),
> -				XFS_FSB_TO_BB(mp, flen),
> +				XFS_AGB_TO_DADDR(mp, agno, tbno),
> +				XFS_FSB_TO_BB(mp, tlen),
>  				GFP_NOFS, 0);
>  		if (error)
>  			goto out_del_cursor;
> -		*blocks_trimmed += flen;
> +		*blocks_trimmed += tlen;

Hmmm - that means if we get a case 8 overlap, we'll only trim one
side of the extent. That's probably not a big deal. However, perhaps
this should check the size of the trimmed extent before issuing the
discard against it in case we've reduced it to something smaller
thanthe minimum requested trim size....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs


[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux