On Fri, 2011-03-04 at 07:59 -0500, Christoph Hellwig wrote: > plain text document attachment (xfs-skip-busy-extents) > Every time we reallocate a busy extent, we cause a synchronous log force > to occur to ensure the freeing transaction is on disk before we continue > and use the newly allocated extent. This is extremely sub-optimal as we > have to mark every transaction with blocks that get reused as synchronous. > > Instead of searching the busy extent list after deciding on the extent to > allocate, check each candidate extent during the allocation decisions as > to whether they are in the busy list. If they are in the busy list, we > trim the busy range out of the extent we have found and determine if that > trimmed range is still OK for allocation. In many cases, this check can > be incorporated into the allocation extent alignment code which already > does trimming of the found extent before determining if it is a valid > candidate for allocation. This time I just scanned most of the change, since it appears it's almost the same except for the (renamed) xfs_alloc_busy_trim() function. It looks correct now, but I have a few things for you to consider. I'll wait for your response in case you want to change anything. After that I'll pull in the three patches (probably tomorrow). > [hch: merged two earlier patches from Dave and fixed various bugs] > > Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx> > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > Index: xfs/fs/xfs/xfs_alloc.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-02 12:18:01.599040095 -0500 > +++ xfs/fs/xfs/xfs_alloc.c 2011-03-02 12:19:10.599027233 -0500 . . . > @@ -2657,6 +2727,177 @@ xfs_alloc_busy_search( > return match; > } > > +/* > + * For a given extent [fbno, flen], search the busy extent list > + * to find a subset of the extent that is not busy. > + */ I agree that the notation (from Dave) that you use here is very helpful in visualizing what's going on. But the underlying code is pretty simple, and it gets somewhat lost in the comments I think. I therefore would kind of prefer to have the explanation moved up above the function. It clearly labels the cases being treated, and references to those can be put in the code, below. (This is a style thing, so if you feel strongly that it's better as you have it, so be it.) > +STATIC void > +xfs_alloc_busy_trim( > + struct xfs_alloc_arg *args, > + xfs_agblock_t fbno, > + xfs_extlen_t flen, > + xfs_agblock_t *rbno, > + xfs_extlen_t *rlen) > +{ > + struct rb_node *rbp; > + > + ASSERT(flen > 0); > + > + spin_lock(&args->pag->pagb_lock); > + rbp = args->pag->pagb_tree.rb_node; > + while (rbp && flen >= args->minlen) { > + struct xfs_busy_extent *busyp = > + rb_entry(rbp, struct xfs_busy_extent, rb_node); > + xfs_agblock_t fend = fbno + flen; All the nice diagrams refer to the variable "fbno" and "fend" using "bno" and "end. I think you should either drop the "f" in the variables or add it to the comments. > + xfs_agblock_t bbno = busyp->bno; > + xfs_agblock_t bend = bbno + busyp->length; > + > + if (fbno + flen <= bbno) { if (fend <= bbno) { > + rbp = rbp->rb_left; > + continue; > + } else if (fbno >= bend) { > + rbp = rbp->rb_right; > + continue; > + } > + > + if (bbno <= fbno) { > + /* start overlap */ > + ASSERT(bend > fbno); > + ASSERT(bend <= fend); This assertion is wrong (Case 1 is an example). The only things you know are: bbno <= fbno bbno < bend therefore (fbno < bend) and fbno < fend but you don't know the relationship between bend and fend. > + > + /* > + * Case 1: > + * bbno bend > + * +BBBBBBBBBBBBBBBBB+ > + * +---------+ > + * bno end > + * As long as you're enumerating all the cases, there's one that you don't mention (but which is covered in this block): * bbno bend * +BBBBBBBBBBBBBB+ * +---------+ * bno end * I think this should be added to the description for completeness. > + * Case 2: > + * bbno bend > + * +BBBBBBBBBBBBBBBBB+ > + * +-------------+ > + * bno end > + * > + . . . > + } else { > + /* middle overlap */ > + > + /* > + * Case 9: > + * bbno bend > + * +BBBBBBBBBBBBBBBBB+ > + * +-----------------------------------+ > + * bno end > + * > + * Can be trimmed to: > + * +-------+ OR +-------+ > + * bno end bno end > + * The following block of text explains things, but it might be clearer if it's rearranged a bit: - Backward allocation leads to significant fragmentation of directories, which degrades directory performance. - Therefore we always want to choose the option that produces forward allocation patterns. - Preferring the lower bno extent will make the next request use "end" as the start of the next allocation. If the segment is no longer busy at that point, we'll then get a contiguous allocation, but even if it is still busy, we'll get a forward allocation. - We try to avoid choosing the segment at "bend", because that can lead to the next allocation taking the segment at "bno"--which would be a backward allocation. - We only use the segment at "bno" if it is much larger than the current requested size, because in that case there's a good chance subsequent allocations will be contiguous. (Something like that anyway, I just wanted to provide an example rather than just saying "it's wrong.") > + * We prefer the lower bno extent because the next > + * allocation for this inode will use "end" as the > + * target for first block. If the busy segment has > + * cleared, this will get a contiguous allocation next > + * time around; if thebusy segment has not cleared, > + * it will get an allocation at bend, which is a forward > + * allocation. > + * > + * If we choose segment at bend, and this remains the > + * best extent for the next allocation (e.g. NEAR_BNO > + * allocation) we'll next allocate at bno, which will > + * give us backwards allocation. We already know that > + * backwards allocation direction causes significant > + * fragmentation of directories and degradataion of > + * directory performance. > + * > + * Always chose the option that produces forward > + * allocation patterns so that sequential reads and > + * writes only ever seek in one direction. Only choose > + * the higher bno extent if the remainin unused extent > + * length is much larger than the current allocation > + * request, promising us a contiguous allocation in > + * the following free space. > + */ > + > + if (bbno - fbno >= args->maxlen) { > + /* left candidate fits perfect */ > + fend = bbno; > + } else if (fend - bend >= args->maxlen * 4) { This magic value 4 ought to be justified, or experimented with, or possibly set as a tunable (for the time being). _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs