On Sun, Sep 15, 2019 at 08:00:35AM +1000, Dave Chinner wrote: > On Fri, Sep 13, 2019 at 10:58:02AM -0400, Brian Foster wrote: > > On Fri, Sep 13, 2019 at 08:35:19AM +1000, Dave Chinner wrote: > > > On Thu, Sep 12, 2019 at 10:32:22AM -0400, Brian Foster wrote: > > > > The bmap block allocation code issues a sequence of retries to > > > > perform an optimal allocation, gradually loosening constraints as > > > > allocations fail. For example, the first attempt might begin at a > > > > particular bno, with maxlen == minlen and alignment incorporated. As > > > > allocations fail, the parameters fall back to different modes, drop > > > > alignment requirements and reduce the minlen and total block > > > > requirements. > > > > > > > > For large extent allocations with an args.total value that exceeds > > > > the allocation length (i.e., non-delalloc), the total value tends to > > > > dominate despite these fallbacks. For example, an aligned extent > > > > allocation request of tens to hundreds of MB that cannot be > > > > satisfied from a particular AG will not succeed after dropping > > > > alignment or minlen because xfs_alloc_space_available() never > > > > selects an AG that can't satisfy args.total. The retry sequence > > > > eventually reduces total and ultimately succeeds if a minlen extent > > > > is available somewhere, but the first several retries are > > > > effectively pointless in this scenario. > > > > > > > > Beyond simply being inefficient, another side effect of this > > > > behavior is that we drop alignment requirements too aggressively. > > > > Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe > > > > unit: > > > > > > > > # xfs_io -c "falloc 0 1g" /mnt/file > > > > # <xfstests>/src/t_stripealign /mnt/file 32 > > > > /mnt/file: Start block 347176 not multiple of sunit 32 > > > > > > Ok, so what Carlos and I found last night was an issue with the > > > the agresv code leading to the maximum free extent calculated > > > by xfs_alloc_longest_free_extent() being longer than the largest > > > allowable extent allocation (mp->m_ag_max_usable) resulting in the > > > situation where blen > args->maxlen, and so in the case of initial > > > allocation here, we never run this: > > > > > > /* > > > * Adjust for alignment > > > */ > > > if (blen > args.alignment && blen <= args.maxlen) > > > args.minlen = blen - args.alignment; > > > args.minalignslop = 0; > > > > .... > > > > As a step towards addressing this problem, insert a new retry in the > > > > bmap allocation sequence to drop minlen (from maxlen) before tossing > > > > alignment. This should still result in as large of an extent as > > > > possible as the block allocator prioritizes extent size in all but > > > > exact allocation modes. By itself, this does not change the behavior > > > > of the command above because the preallocation code still specifies > > > > total based on maxlen. Instead, this facilitates preservation of > > > > alignment once extra reservation is separated from the extent length > > > > portion of the total block requirement. > > > > > > AFAICT this is not necessary. The prototypoe patch I wrote last > > > night while working through this with Carlos is attached below. I > > > updated with a variant of your patch 2 to demonstrate that it does > > > actually solve the problem of full AG allocation failing to be > > > aligned. > > > > > > > I agree that this addresses the reported issue, but I can reproduce > > other corner cases affected by the original patch that aren't affected > > by this one. For example, if the allocation request happens to be > > slightly less than blen but not enough to allow for alignment, minlen > > isn't dropped and we can run through the same allocation retry sequence > > that kills off alignment before success. > > But isn't that just another variation of the initial conditions > (minlen/maxlen) not being set up correctly for alignment when the AG > is empty? > Perhaps, though I don't think it's exclusive to an empty AG. > i.e. Take the above condition and change it like this: > > /* > * Adjust for alignment > */ > - if (blen > args.alignment && blen <= args.maxlen) > + if (blen > args.alignment && blen <= args.maxlen + args.alignment) > args.minlen = blen - args.alignment; > args.minalignslop = 0; > > and now we cover all the cases when blen covers an aligned maxlen > allocation... > Do we want to consider whether minlen goes to 1? Otherwise that looks reasonable to me. What I was trying to get at is just that we should consider whether there are any other corner cases (that we might care about) where this particular allocation might not behave as expected vs. just the example used in the original commit log. If somebody wants to send a finalized patch or two with these fixes along with the bma.total one (or I can tack it on in reply..?), I'll think about it further on review as well.. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx