On Thu 11-04-13 22:09:56, Jan Kara wrote: > Writing a large file using direct IO in 16 MB chunks sometimes results > in a pathological allocation pattern where 16 MB chunks of large free > extent are allocated to a file in a reversed order. So extents of a file > look for example as: > > ext logical physical expected length flags > 0 0 13 4550656 > 1 4550656 188136807 4550668 12562432 > 2 17113088 200699240 200699238 622592 > 3 17735680 182046055 201321831 4096 > 4 17739776 182041959 182050150 4096 > 5 17743872 182037863 182046054 4096 > 6 17747968 182033767 182041958 4096 > 7 17752064 182029671 182037862 4096 > ... > 6757 45400064 154381644 154389835 4096 > 6758 45404160 154377548 154385739 4096 > 6759 45408256 252951571 154381643 73728 eof > > This happens because XFS_ALLOCTYPE_THIS_BNO allocation fails (the last > extent in the file cannot be further extended) so we fall back to > XFS_ALLOCTYPE_NEAR_BNO allocation which picks end of a large free > extent as the best place to continue the file. Since the chunk at the > end of the free extent again cannot be further extended, this behavior > repeats until the whole free extent is consumed in a reversed order. > > For data allocations this backward allocation isn't beneficial so make > xfs_alloc_compute_diff() pick start of a free extent instead of its end > for them. That avoids the backward allocation pattern. > > See thread at http://oss.sgi.com/archives/xfs/2013-03/msg00144.html for > more details about the reproduction case and why this solution was > chosen. > > Based on idea by Dave Chinner <dchinner@xxxxxxxxxx>. > > CC: Dave Chinner <dchinner@xxxxxxxxxx> > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> > Signed-off-by: Jan Kara <jack@xxxxxxx> > --- > fs/xfs/xfs_alloc.c | 24 ++++++++++++++++++------ > 1 files changed, 18 insertions(+), 6 deletions(-) > > v2: Updated comment and commit description. Could anybody pull this patch into XFS tree? I don't see it there... Honza > > diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c > index 0ad2325..f99113d 100644 > --- a/fs/xfs/xfs_alloc.c > +++ b/fs/xfs/xfs_alloc.c > @@ -173,6 +173,7 @@ xfs_alloc_compute_diff( > xfs_agblock_t wantbno, /* target starting block */ > xfs_extlen_t wantlen, /* target length */ > xfs_extlen_t alignment, /* target alignment */ > + char userdata, /* are we allocating data? */ > xfs_agblock_t freebno, /* freespace's starting block */ > xfs_extlen_t freelen, /* freespace's length */ > xfs_agblock_t *newbnop) /* result: best start block from free */ > @@ -187,7 +188,14 @@ xfs_alloc_compute_diff( > ASSERT(freelen >= wantlen); > freeend = freebno + freelen; > wantend = wantbno + wantlen; > - if (freebno >= wantbno) { > + /* > + * We want to allocate from the start of a free extent if it is past > + * the desired block or if we are allocating user data and the free > + * extent is before desired block. The second case is there to allow > + * for contiguous allocation from the remaining free space if the file > + * grows in the short term. > + */ > + if (freebno >= wantbno || (userdata && freeend < wantend)) { > if ((newbno1 = roundup(freebno, alignment)) >= freeend) > newbno1 = NULLAGBLOCK; > } else if (freeend >= wantend && alignment > 1) { > @@ -772,7 +780,8 @@ xfs_alloc_find_best_extent( > xfs_alloc_fix_len(args); > > sdiff = xfs_alloc_compute_diff(args->agbno, args->len, > - args->alignment, *sbnoa, > + args->alignment, > + args->userdata, *sbnoa, > *slena, &new); > > /* > @@ -943,7 +952,8 @@ restart: > if (args->len < blen) > continue; > ltdiff = xfs_alloc_compute_diff(args->agbno, args->len, > - args->alignment, ltbnoa, ltlena, <new); > + args->alignment, args->userdata, ltbnoa, > + ltlena, <new); > if (ltnew != NULLAGBLOCK && > (args->len > blen || ltdiff < bdiff)) { > bdiff = ltdiff; > @@ -1095,7 +1105,8 @@ restart: > args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen); > xfs_alloc_fix_len(args); > ltdiff = xfs_alloc_compute_diff(args->agbno, args->len, > - args->alignment, ltbnoa, ltlena, <new); > + args->alignment, args->userdata, ltbnoa, > + ltlena, <new); > > error = xfs_alloc_find_best_extent(args, > &bno_cur_lt, &bno_cur_gt, > @@ -1111,7 +1122,8 @@ restart: > args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen); > xfs_alloc_fix_len(args); > gtdiff = xfs_alloc_compute_diff(args->agbno, args->len, > - args->alignment, gtbnoa, gtlena, >new); > + args->alignment, args->userdata, gtbnoa, > + gtlena, >new); > > error = xfs_alloc_find_best_extent(args, > &bno_cur_gt, &bno_cur_lt, > @@ -1170,7 +1182,7 @@ restart: > } > rlen = args->len; > (void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment, > - ltbnoa, ltlena, <new); > + args->userdata, ltbnoa, ltlena, <new); > ASSERT(ltnew >= ltbno); > ASSERT(ltnew + rlen <= ltbnoa + ltlena); > ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length)); > -- > 1.7.1 > -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs