On Wed, Mar 23, 2022 at 10:15:42PM -0700, Darrick J. Wong wrote: > On Thu, Mar 24, 2022 at 07:39:16AM +1100, Dave Chinner wrote: > > On Sun, Mar 20, 2022 at 09:43:32AM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > Currently, we use this undocumented macro to encode the minimum number > > > of blocks needed to replenish a completely empty AGFL when an AG is > > > nearly full. This has lead to confusion on the part of the maintainers, > > > so let's document what the value actually means, and move it to > > > xfs_alloc.c since it's not used outside of that module. > > > > Code change looks fine, but.... > > > > > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx> > > > --- > > > fs/xfs/libxfs/xfs_alloc.c | 23 ++++++++++++++++++----- > > > fs/xfs/libxfs/xfs_alloc.h | 1 - > > > 2 files changed, 18 insertions(+), 6 deletions(-) > > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > > index 353e53b892e6..b0678e96ce61 100644 > > > --- a/fs/xfs/libxfs/xfs_alloc.c > > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > > @@ -82,6 +82,19 @@ xfs_prealloc_blocks( > > > } > > > > > > /* > > > + * The number of blocks per AG that we withhold from xfs_mod_fdblocks to > > > + * guarantee that we can refill the AGFL prior to allocating space in a nearly > > > + * full AG. We require two blocks per free space btree because free space > > > + * btrees shrink to a single block as the AG fills up, and any allocation can > > > + * cause a btree split. The rmap btree uses a per-AG reservation to withhold > > > + * space from xfs_mod_fdblocks, so we do not account for that here. > > > + */ > > > +#define XFS_ALLOCBT_AGFL_RESERVE 4 > > > > .... that comment is not correct. this number had nothing to do > > with btree split reservations and is all about preventing > > oversubscription of the AG when the free space trees are completely > > empty. By the time there is enough free space records in the AG for > > the bno/cnt trees to be at risk of a single level root split > > (several hundred free extent records), there is enough free space to > > fully allocate the 4 blocks that the AGFL needs for that split. > > > > That is, the set aside was designed to prevent AG selection in > > xfs_alloc_vextent() having xfs_alloc_fixup_freelist() modifying an > > AG to fix up the AGFL and then fail to fully fill the AGFL because, > > say, there were only 2 blocks free in the AG and the AGFL needed 3. > > Then we try all other AGs and get ENOSPC from all of them, and we > > end up cancelling a dirty transaction and shutting down instead of > > propagating ENOSPC up to the user. > > > > This "not enough blocks to populate the AGFL" problem arose because > > we can allocate extents directly from the AGFL if the free space > > btree is empty, resulting in depletion of the AGFL and no free space > > to re-populate it. Freeing a block will then go back into the btree, > > and so the next allocation attempt might need 2 blocks for the AGFL, > > have one block in the free space tree, and then we fail to fill > > the AGFL completely because we still need one block for the AGFL and > > there's no space available anymore. If all other AGs are like this > > or ENOSPC, then kaboom. > > > > IOWs, I originally added this per-ag set aside so that when the AG > > was almost completely empty and we were down to allocating the last > > blocks from the AG, users couldn't oversubscribe global free space by > > consuming the blocks the AGs required to fill the AGFLs to allow the > > last blocks that users could allocate to be allocated safely. > > > > Hence the set aside of 4 blocks per AG was not to ensure the > > freespace trees could be split, but to ensure every last possible > > block could be allocated from the AG without causing the AG > > selection algorithms to select and modify AGs that could not have > > their AGFL fully fixed up to allocate the blocks that the caller > > needed when near ENOSPC... > > Hmmm, thanks for the context! What do you think about this revised > comment? > > /* > * The number of blocks per AG that we withhold from xfs_mod_fdblocks to > * guarantee that we can refill the AGFL prior to allocating space in a > * nearly full AG. Although the the space described in the free space > * btrees, the freesp btrees, and the blocks owned by the agfl are all > * added to the ondisk fdblocks, it's a mistake to let the ondisk free > * space in the AG drop so low that the free space btrees cannot refill > * an empty AGFL up to the minimum level. Rather than grind through > * empty AGs until the fs goes down, we subtract this many AG blocks > * from the incore fdblocks to ENOSPC early. s/ENOSPC early./ensure user allocation does not overcommit the space the filesystem needs for the AGFLs./ And it's all good by me. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx