On Sun, Mar 20, 2022 at 09:43:32AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > Currently, we use this undocumented macro to encode the minimum number > of blocks needed to replenish a completely empty AGFL when an AG is > nearly full. This has lead to confusion on the part of the maintainers, > so let's document what the value actually means, and move it to > xfs_alloc.c since it's not used outside of that module. Code change looks fine, but.... > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > fs/xfs/libxfs/xfs_alloc.c | 23 ++++++++++++++++++----- > fs/xfs/libxfs/xfs_alloc.h | 1 - > 2 files changed, 18 insertions(+), 6 deletions(-) > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index 353e53b892e6..b0678e96ce61 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -82,6 +82,19 @@ xfs_prealloc_blocks( > } > > /* > + * The number of blocks per AG that we withhold from xfs_mod_fdblocks to > + * guarantee that we can refill the AGFL prior to allocating space in a nearly > + * full AG. We require two blocks per free space btree because free space > + * btrees shrink to a single block as the AG fills up, and any allocation can > + * cause a btree split. The rmap btree uses a per-AG reservation to withhold > + * space from xfs_mod_fdblocks, so we do not account for that here. > + */ > +#define XFS_ALLOCBT_AGFL_RESERVE 4 .... that comment is not correct. this number had nothing to do with btree split reservations and is all about preventing oversubscription of the AG when the free space trees are completely empty. By the time there is enough free space records in the AG for the bno/cnt trees to be at risk of a single level root split (several hundred free extent records), there is enough free space to fully allocate the 4 blocks that the AGFL needs for that split. That is, the set aside was designed to prevent AG selection in xfs_alloc_vextent() having xfs_alloc_fixup_freelist() modifying an AG to fix up the AGFL and then fail to fully fill the AGFL because, say, there were only 2 blocks free in the AG and the AGFL needed 3. Then we try all other AGs and get ENOSPC from all of them, and we end up cancelling a dirty transaction and shutting down instead of propagating ENOSPC up to the user. This "not enough blocks to populate the AGFL" problem arose because we can allocate extents directly from the AGFL if the free space btree is empty, resulting in depletion of the AGFL and no free space to re-populate it. Freeing a block will then go back into the btree, and so the next allocation attempt might need 2 blocks for the AGFL, have one block in the free space tree, and then we fail to fill the AGFL completely because we still need one block for the AGFL and there's no space available anymore. If all other AGs are like this or ENOSPC, then kaboom. IOWs, I originally added this per-ag set aside so that when the AG was almost completely empty and we were down to allocating the last blocks from the AG, users couldn't oversubscribe global free space by consuming the blocks the AGs required to fill the AGFLs to allow the last blocks that users could allocate to be allocated safely. Hence the set aside of 4 blocks per AG was not to ensure the freespace trees could be split, but to ensure every last possible block could be allocated from the AG without causing the AG selection algorithms to select and modify AGs that could not have their AGFL fully fixed up to allocate the blocks that the caller needed when near ENOSPC... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx