On Wed, Oct 25, 2023 at 08:21:02PM -0700, Darrick J. Wong wrote: > On Thu, Oct 26, 2023 at 10:37:39AM +1100, Dave Chinner wrote: > > On Wed, Oct 25, 2023 at 03:39:09PM -0700, Darrick J. Wong wrote: > > > On Wed, Oct 25, 2023 at 01:03:00PM -0700, Omar Sandoval wrote: > > > > On Wed, Oct 25, 2023 at 08:50:46AM -0700, Darrick J. Wong wrote: > > > > > The absolute maximum height of a free space btree is 7 blocks, according > > > > > to xfs_db: > > > > > > > > > > # xfs_db /dev/sda -c 'btheight -w absmax all' > > > > > bnobt: 7 > > > > > cntbt: 7 > > > > > inobt: 6 > > > > > finobt: 6 > > > > > bmapbt: 14 > > > > > refcountbt: 6 > > > > > rmapbt: 10 > > > > > > > > > > The smallest AGFL is 62 records long (V4 fs, 512b blocks) so I don't > > > > > think it's /that/ drastic. For a filesystem with rmapbt (V5, 1k blocks) > > > > > that minimum jumps to 121 blocks. > > > > Yup. keep in mind this comment in xfs_alloc_fix_freelist(): > > > > * XXX (dgc): When we have lots of free space, does this buy us > > * anything other than extra overhead when we need to put more blocks > > * back on the free list? Maybe we should only do this when space is > > * getting low or the AGFL is more than half full? > > > > IOWs, I've previously mused about keeping the AGFL much fuller than > > the absolute minimum. There doesn't seem to be any real gain to > > keeping it at just the minimum size when we are nowhere near ENOSPC, > > it just means we are doing lots of management operations like > > reducing it immediately after a transaction that has released blocks > > to the freelist, then only to have to extend it again when we do an > > operation that consumes blocks from the free list. > > > > i.e. should we add also hysteresis to limit the number of AGFL > > size manipulations we need to do in mixed workloads? i.e. we don't free > > blocks from the AGFL until it might get overfilled by an operation > > (i.e. max size - max merge free blocks) or we are nearing ENOSPC, > > and we don't fill till we are at the minimum. In either case, the > > target for empty or fill operations is "half full"... > > Hmm. What if we kept the AGFL completely full? It can't be kept that way - we have to have space to store blocks that are released by tree merges. i.e. like a tree split requires a minimum number blocks in the free list for it to succeed, a tree merge needs a minimum number of free slots in the AGFL for it to succeed. > On that 512b-fsblock V4 filesystem, that'd use up 128 blocks instead of 7 > blocks. 128 blocks == 64K per AG. > > On a 64K-fsblock V5 filesystem, that'd use up 16370 64K blocks per AG, or > ~1GB per AG. Ok, that might be too much. > > On a 4K-fsblock V5 fs, that's 1010 4K blocks per AG, or ~4MB per AG. > > A gigabyte is a bit ridiculous, but what if we always did the maximum > bnobt/cntbt/rmapbt height instead of current_height + 1? Yes, largely full AGFLs all the time doesn't make a whole lot of sense, but just a max height reservation still doesn't cover the case exposed by this fix. It would need to be 2 * max height to cover the double split case we are talking about here. Also, if we change from the current height to always use the maximum, it re-introduces a bunch of ENOSPC issues resulting from overcommitting user data space. That is, we must ensure that global free space does not exhaust the free space that we need to refill the AGFLs to enable an extent free at ENOSPC to proceed. i.e. the mp->m_alloc_set_aside value (i.e. XFS_ALLOCBT_AGFL_RESERVE) needs to reflect the fact this would change the minimum AGFL fill at ENOSPC. Further, changing to max reservation for the minimum fill fails to recognoise that at ENOSPC, the free space trees are empty and so will never split, and so a large minimum AGFL fill is completely unnecessary - at most all we need is to have enough blocks in the AGFL for the single level root split to occur (i.e. 2 blocks per free space btree). That's waht the current XFS_ALLOCBT_AGFL_RESERVE value is set to. So, really, I don't see moving away from the variable minimum reservation we currently have is a good idea. Adding a bit of hysteresis to the AGFL fill/drain operations doesn't require changing the min fill requirements, just picking a sensible target above the current minimum fill... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx