Hi, One of the teams that I work with hits WARNs in xfs_bmap_extents_to_btree() on a database workload of theirs. The last time the subject came up on linux-xfs, it was suggested[1] to try building an AG reserve pool for the AGFL. I managed to work out a reproducer for the problem. Debugging that, the steps Gao outlined turned out to be essentially what was necessary to get the problem to happen repeatably. 1. Allocate almost all of the space in an AG 2. Free and reallocate that space to fragement it so the freespace b-trees are just about to split. 3. Allocate blocks in a file such that the next extent allocated for that file will cause its bmbt to get converted from an inline extent to a b-tree. 4. Free space such that the free-space btrees have a contiguous extent with a busy portion on either end 5. Allocate the portion in the middle, splitting the extent and triggering a b-tree split. On older kernels this is all it takes. After the AG-aware allocator changes I also need to start the allocation in the highest numbered AG available while inducing lock contention in the lower numbered AGs. In order to ensure that AGs have enough space to complete transactions with multiple allocations, I've taken a stab at implementing an AGFL reserve pool. This patchset passes fstests without any regressions and also does not trigger the reproducers I wrote for the case above. I've also run those with tracing enabled to validate that it's got the accounting correct, is rejecting allocations when there's no space in the reserve, and is tapping the reserve when appropriate. The first patch is the plumbing that re-establishes the reserve for the AGFL. I'm happy to break this into something smaller, if it's too large. The remaining patches add additional pieces needed to check how much space the AGFL might need on a refill, and then to actually use the reserve to permit or deny allocation requests, as the case may be. I'm sending this as an RFC, since I still have a few outstanding questions and would appreciate feedback. Some of those questions are: Patch 1 includes all freespace that is not allocated to the rmapbt in its used / reserved accounting. It also borrows the heuristics from rmapbt in terms of picking the initial size of the reservation. The numbers I'm getting seem a bit large. Any suggestions about how to improve this further? Patches 3 and 4 use the allocation args structure to attempt to decide whether an allocation is the first in a transaction, or if its a subsequent allocation. Are there any recommendations about a better way to do this? Thanks, -K [1] https://lore.kernel.org/linux-xfs/20221116025106.GB3600936@xxxxxxxxxxxxxxxxxxx/ Krister Johansen (4): xfs: resurrect the AGFL reservation xfs: modify xfs_alloc_min_freelist to take an increment xfs: let allocations tap the AGFL reserve xfs: refuse allocations without agfl refill space fs/xfs/libxfs/xfs_ag.h | 2 + fs/xfs/libxfs/xfs_ag_resv.c | 54 ++++++++++++++----- fs/xfs/libxfs/xfs_ag_resv.h | 4 ++ fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++---- fs/xfs/libxfs/xfs_alloc.h | 5 +- fs/xfs/libxfs/xfs_alloc_btree.c | 59 +++++++++++++++++++++ fs/xfs/libxfs/xfs_alloc_btree.h | 5 ++ fs/xfs/libxfs/xfs_bmap.c | 2 +- fs/xfs/libxfs/xfs_ialloc.c | 2 +- fs/xfs/libxfs/xfs_rmap_btree.c | 5 ++ fs/xfs/scrub/fscounters.c | 1 + 11 files changed, 207 insertions(+), 26 deletions(-) base-commit: 58f880711f2ba53fd5e959875aff5b3bf6d5c32e -- 2.25.1