Re: [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 24, 2022 at 07:39:16AM +1100, Dave Chinner wrote:
> On Sun, Mar 20, 2022 at 09:43:32AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > Currently, we use this undocumented macro to encode the minimum number
> > of blocks needed to replenish a completely empty AGFL when an AG is
> > nearly full.  This has lead to confusion on the part of the maintainers,
> > so let's document what the value actually means, and move it to
> > xfs_alloc.c since it's not used outside of that module.
> 
> Code change looks fine, but....
> 
> > 
> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c |   23 ++++++++++++++++++-----
> >  fs/xfs/libxfs/xfs_alloc.h |    1 -
> >  2 files changed, 18 insertions(+), 6 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 353e53b892e6..b0678e96ce61 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -82,6 +82,19 @@ xfs_prealloc_blocks(
> >  }
> >  
> >  /*
> > + * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
> > + * guarantee that we can refill the AGFL prior to allocating space in a nearly
> > + * full AG.  We require two blocks per free space btree because free space
> > + * btrees shrink to a single block as the AG fills up, and any allocation can
> > + * cause a btree split.  The rmap btree uses a per-AG reservation to withhold
> > + * space from xfs_mod_fdblocks, so we do not account for that here.
> > + */
> > +#define XFS_ALLOCBT_AGFL_RESERVE	4
> 
> .... that comment is not correct.  this number had nothing to do
> with btree split reservations and is all about preventing
> oversubscription of the AG when the free space trees are completely
> empty.  By the time there is enough free space records in the AG for
> the bno/cnt trees to be at risk of a single level root split
> (several hundred free extent records), there is enough free space to
> fully allocate the 4 blocks that the AGFL needs for that split.
> 
> That is, the set aside was designed to prevent AG selection in
> xfs_alloc_vextent() having xfs_alloc_fixup_freelist() modifying an
> AG to fix up the AGFL and then fail to fully fill the AGFL because,
> say, there were only 2 blocks free in the AG and the AGFL needed 3.
> Then we try all other AGs and get ENOSPC from all of them, and we
> end up cancelling a dirty transaction and shutting down instead of
> propagating ENOSPC up to the user.
> 
> This "not enough blocks to populate the AGFL" problem arose because
> we can allocate extents directly from the AGFL if the free space
> btree is empty, resulting in depletion of the AGFL and no free space
> to re-populate it. Freeing a block will then go back into the btree,
> and so the next allocation attempt might need 2 blocks for the AGFL,
> have one block in the free space tree, and then we fail to fill
> the AGFL completely because we still need one block for the AGFL and
> there's no space available anymore. If all other AGs are like this
> or ENOSPC, then kaboom.
> 
> IOWs, I originally added this per-ag set aside so that when the AG
> was almost completely empty and we were down to allocating the last
> blocks from the AG, users couldn't oversubscribe global free space by
> consuming the blocks the AGs required to fill the AGFLs to allow the
> last blocks that users could allocate to be allocated safely.
> 
> Hence the set aside of 4 blocks per AG was not to ensure the
> freespace trees could be split, but to ensure every last possible
> block could be allocated from the AG without causing the AG
> selection algorithms to select and modify AGs that could not have
> their AGFL fully fixed up to allocate the blocks that the caller
> needed when near ENOSPC...

Hmmm, thanks for the context!  What do you think about this revised
comment?

/*
 * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
 * guarantee that we can refill the AGFL prior to allocating space in a
 * nearly full AG.  Although the the space described in the free space
 * btrees, the freesp btrees, and the blocks owned by the agfl are all
 * added to the ondisk fdblocks, it's a mistake to let the ondisk free
 * space in the AG drop so low that the free space btrees cannot refill
 * an empty AGFL up to the minimum level.  Rather than grind through
 * empty AGs until the fs goes down, we subtract this many AG blocks
 * from the incore fdblocks to ENOSPC early.
 */
#define XFS_ALLOCBT_AGFL_RESERVE	4

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux