On Fri, Jan 21, 2011 at 04:22:30AM -0500, Christoph Hellwig wrote: > Every time we reallocate a busy extent, we cause a synchronous log force > to occur to ensure the freeing transaction is on disk before we continue > and use the newly allocated extent. This is extremely sub-optimal as we > have to mark every transaction with blocks that get reused as synchronous. > > Instead of searching the busy extent list after deciding on the extent to > allocate, check each candidate extent during the allocation decisions as > to whether they are in the busy list. If they are in the busy list, we > trim the busy range out of the extent we have found and determine if that > trimmed range is still OK for allocation. In many cases, this check can > be incorporated into the allocation extent alignment code which already > does trimming of the found extent before determining if it is a valid > candidate for allocation. > > [hch: merged two earlier patches from Dave and fixed various bugs] > > Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx> > Signed-off-by: Christoph Hellwig <hch@xxxxxx> .... > ASSERT(args->agbno % args->alignment == 0); > > if (!args->wasfromfl) { > - xfs_alloc_update_counters(args->tp, args->pag, args->agbp, > - -((long)(args->len))); > + error = xfs_alloc_update_counters(args->tp, args->pag, > + args->agbp, > + -((long)(args->len))); > > - /* > - * Search the busylist for these blocks and mark the > - * transaction as synchronous if blocks are found. This > - * avoids the need to block due to a synchronous log > - * force to ensure correct ordering as the synchronous > - * transaction will guarantee that for us. > - */ > - if (xfs_alloc_busy_search(args->mp, args->agno, > - args->agbno, args->len)) > - xfs_trans_set_sync(args->tp); > + ASSERT(!xfs_alloc_busy_search(args->mp, args->agno, > + args->agbno, args->len)); > } > > if (!args->isfl) { > @@ -559,7 +554,7 @@ xfs_alloc_ag_vextent( > > XFS_STATS_INC(xs_allocx); > XFS_STATS_ADD(xs_allocb, args->len); > - return 0; > + return error; > } These error handling changes could go in the previous patch. > @@ -2612,7 +2688,7 @@ restart: > * will require a synchronous transaction, but it can still be > * used to distinguish between a partial or exact match. > */ > -int > +STATIC int > xfs_alloc_busy_search( > struct xfs_mount *mp, > xfs_agnumber_t agno, > @@ -2654,6 +2730,71 @@ xfs_alloc_busy_search( > return match; > } > > +/* > + * For a given extent [fbno, flen], search the busy extent list > + * to find a subset of the extent that is not busy. > + */ How does this function return an indication that the extent selected is entirrely unsuitable? e.g. the extent lies wholly within the busy range and gets trimmed to zero length? perhaps it woul dbe good to return an error in this case? > +void > +xfs_alloc_busy_search_trim( > + struct xfs_mount *mp, > + struct xfs_perag *pag, > + xfs_agblock_t fbno, > + xfs_extlen_t flen, > + xfs_agblock_t *rbno, > + xfs_extlen_t *rlen) > +{ > + struct rb_node *rbp; > + xfs_agblock_t bno = fbno; > + xfs_extlen_t len = flen; > + > + spin_lock(&pag->pagb_lock); > + rbp = pag->pagb_tree.rb_node; > + while (rbp) { > + struct xfs_busy_extent *busyp = > + rb_entry(rbp, struct xfs_busy_extent, rb_node); > + xfs_agblock_t end = bno + len; > + xfs_agblock_t bend = busyp->bno + busyp->length; > + > + if (bno + len <= busyp->bno) { > + rbp = rbp->rb_left; > + continue; > + } else if (bno >= busyp->bno + busyp->length) { > + rbp = rbp->rb_right; > + continue; > + } if (end <= bbno) left; else if (bno > bend) right; /* overlap */ > + > + if (busyp->bno < bno) { > + /* start overlap */ > + ASSERT(bend >= bno); > + ASSERT(bend <= end); > + len -= bno - bend; > + bno = bend; if (bbno < bno) { bbno bend +-----------------+ Case 1: +---------+ bno end No unbusy region in extent, return failure Case 2: +------------------------+ bno end Needs to be trimmed to: +-------+ bno end bno = bend; len = end - bno; > + } else if (bend > end) { > + /* end overlap */ > + ASSERT(busyp->bno >= bno); > + ASSERT(busyp->bno < end); > + len -= bend - end; } else if (bend > end) { bno must be <= bbno in this case, so: bbno bend +-----------------+ Case 3: bno == bbno: +-------------+ bno end No unbusy region in extent, return failure. Case 4: +------------------+ bno end Needs to be trimmed to: +-------+ bno end end = bbno; len = end - bno; > + } else { > + /* middle overlap - return larger segment */ > + ASSERT(busyp->bno >= bno); > + ASSERT(bend <= end); > + len = busyp->bno - bno; > + if (len >= end - bend) { > + /* use first segment */ > + len = len; > + } else { > + /* use last segment */ > + bno = bend; > + len = end - bend; > + } (bno <= bbno && bend <= end) in this case, so: bbno bend +-----------------+ Case 5: Exact match: (bno == bbno && bend == end) +-----------------+ bno end No unbusy region in extent, return failure. Case 6: (bno == bbno && bend < end) +-------------------------+ bno end Needs to be trimmed to: +-------+ bno end Same as Case 2. - case 2 can be extend to bbno <= bno. Case 7: (bno < bbno && bend == end) +-------------------------+ bno end Needs to be trimmed to: +-------+ bno end Same as case 4. - Case 4 can be extented to bend >= end Case 8: (bno < bbno && bend < end) +---------------------------------+ bno end can be trimmed to: +-------+ OR +-------+ bno end bno end - Should always want to choose the lower bno extent: - next allocation in file will use "end" as the target for first block - if busy segment has cleared, will get contiguous allocation - if busy segment has not cleared, get allocation at bend, which is forwards allocation. - if we choose segment at bend, and this remains the best extent for the next allocation (e.g. NEAR_BNO allocation) we'll next allocate at bno, which will give us backwards allocation. - always chose the option that produces forwards allocation patterns so that sequential reads and writes only ever seek in one direction. - we already know that backwards allocation direction (due to NEAR_BNO second algorithm) causes significant fragmentation of directories and degradataion of directory performance, so I think we should avoid introducing a new allocation algorithm that can lead to backwards allocation around frequently reused extents. - only choose right hand edge if the remainin unused extent length is much larger than the current allocation request. - otherwise return failure if we can't use lower bno due to minlen restrictions so a new extent is chosen by the higher level algorithm. So, it looks to me like the "overlap found" algorithm shoul dbe something like: if (bbno <= bno) { if (end <= bend) { /* case 1, 3, 5 */ return failure; } /* case 2, 6 */ bno = bend; len = end - bno; } else if (bend >= end) { ASSERT(bbno > bno); /* case 4, 7 */ end = bbno; len = end - bno; } else { ASSERT(bbno > bno); ASSERT(bend < end); /* case 8 */ if (bbno - bno >= args->minlen) { /* left candidate OK */ left = 1; } if (end - bend >= args->maxlen * 4) { /* right candidate OK */ right = 1; } if (left && right) { /* take right if left is not a * maximal allocation */ if (bbno - bno < args->maxlen) left = 0; } if (left) { end = bbno; len = end - bno; } else if (right) { bno = bend; len = end - bno; } else { return failure; } } > @@ -109,19 +109,16 @@ xfs_trim_extents( > * If any blocks in the range are still busy, skip the > * discard and try again the next time. > */ > - if (xfs_alloc_busy_search(mp, agno, fbno, flen)) { > - trace_xfs_discard_busy(mp, agno, fbno, flen); > - goto next_extent; > - } > + xfs_alloc_busy_search_trim(mp, pag, fbno, flen, &tbno, &tlen); > > - trace_xfs_discard_extent(mp, agno, fbno, flen); > + trace_xfs_discard_extent(mp, agno, tbno, tlen); > error = -blkdev_issue_discard(bdev, > - XFS_AGB_TO_DADDR(mp, agno, fbno), > - XFS_FSB_TO_BB(mp, flen), > + XFS_AGB_TO_DADDR(mp, agno, tbno), > + XFS_FSB_TO_BB(mp, tlen), > GFP_NOFS, 0); > if (error) > goto out_del_cursor; > - *blocks_trimmed += flen; > + *blocks_trimmed += tlen; Hmmm - that means if we get a case 8 overlap, we'll only trim one side of the extent. That's probably not a big deal. However, perhaps this should check the size of the trimmed extent before issuing the discard against it in case we've reduced it to something smaller thanthe minimum requested trim size.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs