On Fri, Sep 13, 2019 at 08:35:19AM +1000, Dave Chinner wrote: > On Thu, Sep 12, 2019 at 10:32:22AM -0400, Brian Foster wrote: > > The bmap block allocation code issues a sequence of retries to > > perform an optimal allocation, gradually loosening constraints as > > allocations fail. For example, the first attempt might begin at a > > particular bno, with maxlen == minlen and alignment incorporated. As > > allocations fail, the parameters fall back to different modes, drop > > alignment requirements and reduce the minlen and total block > > requirements. > > > > For large extent allocations with an args.total value that exceeds > > the allocation length (i.e., non-delalloc), the total value tends to > > dominate despite these fallbacks. For example, an aligned extent > > allocation request of tens to hundreds of MB that cannot be > > satisfied from a particular AG will not succeed after dropping > > alignment or minlen because xfs_alloc_space_available() never > > selects an AG that can't satisfy args.total. The retry sequence > > eventually reduces total and ultimately succeeds if a minlen extent > > is available somewhere, but the first several retries are > > effectively pointless in this scenario. > > > > Beyond simply being inefficient, another side effect of this > > behavior is that we drop alignment requirements too aggressively. > > Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe > > unit: > > > > # xfs_io -c "falloc 0 1g" /mnt/file > > # <xfstests>/src/t_stripealign /mnt/file 32 > > /mnt/file: Start block 347176 not multiple of sunit 32 > > Ok, so what Carlos and I found last night was an issue with the > the agresv code leading to the maximum free extent calculated > by xfs_alloc_longest_free_extent() being longer than the largest > allowable extent allocation (mp->m_ag_max_usable) resulting in the > situation where blen > args->maxlen, and so in the case of initial > allocation here, we never run this: Just to make it clear: carlos did all the hard work of narrowing it down and isolating the accounting discrepancy in the allocation code. All I did was put 2 and 2 together - the agresv discrepancy - wrote a quick patch and did a trace to make sure I didn't get 5... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx