Re: [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 23, 2022 at 10:15:42PM -0700, Darrick J. Wong wrote:
> On Thu, Mar 24, 2022 at 07:39:16AM +1100, Dave Chinner wrote:
> > On Sun, Mar 20, 2022 at 09:43:32AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > 
> > > Currently, we use this undocumented macro to encode the minimum number
> > > of blocks needed to replenish a completely empty AGFL when an AG is
> > > nearly full.  This has lead to confusion on the part of the maintainers,
> > > so let's document what the value actually means, and move it to
> > > xfs_alloc.c since it's not used outside of that module.
> > 
> > Code change looks fine, but....
> > 
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/libxfs/xfs_alloc.c |   23 ++++++++++++++++++-----
> > >  fs/xfs/libxfs/xfs_alloc.h |    1 -
> > >  2 files changed, 18 insertions(+), 6 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index 353e53b892e6..b0678e96ce61 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -82,6 +82,19 @@ xfs_prealloc_blocks(
> > >  }
> > >  
> > >  /*
> > > + * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
> > > + * guarantee that we can refill the AGFL prior to allocating space in a nearly
> > > + * full AG.  We require two blocks per free space btree because free space
> > > + * btrees shrink to a single block as the AG fills up, and any allocation can
> > > + * cause a btree split.  The rmap btree uses a per-AG reservation to withhold
> > > + * space from xfs_mod_fdblocks, so we do not account for that here.
> > > + */
> > > +#define XFS_ALLOCBT_AGFL_RESERVE	4
> > 
> > .... that comment is not correct.  this number had nothing to do
> > with btree split reservations and is all about preventing
> > oversubscription of the AG when the free space trees are completely
> > empty.  By the time there is enough free space records in the AG for
> > the bno/cnt trees to be at risk of a single level root split
> > (several hundred free extent records), there is enough free space to
> > fully allocate the 4 blocks that the AGFL needs for that split.
> > 
> > That is, the set aside was designed to prevent AG selection in
> > xfs_alloc_vextent() having xfs_alloc_fixup_freelist() modifying an
> > AG to fix up the AGFL and then fail to fully fill the AGFL because,
> > say, there were only 2 blocks free in the AG and the AGFL needed 3.
> > Then we try all other AGs and get ENOSPC from all of them, and we
> > end up cancelling a dirty transaction and shutting down instead of
> > propagating ENOSPC up to the user.
> > 
> > This "not enough blocks to populate the AGFL" problem arose because
> > we can allocate extents directly from the AGFL if the free space
> > btree is empty, resulting in depletion of the AGFL and no free space
> > to re-populate it. Freeing a block will then go back into the btree,
> > and so the next allocation attempt might need 2 blocks for the AGFL,
> > have one block in the free space tree, and then we fail to fill
> > the AGFL completely because we still need one block for the AGFL and
> > there's no space available anymore. If all other AGs are like this
> > or ENOSPC, then kaboom.
> > 
> > IOWs, I originally added this per-ag set aside so that when the AG
> > was almost completely empty and we were down to allocating the last
> > blocks from the AG, users couldn't oversubscribe global free space by
> > consuming the blocks the AGs required to fill the AGFLs to allow the
> > last blocks that users could allocate to be allocated safely.
> > 
> > Hence the set aside of 4 blocks per AG was not to ensure the
> > freespace trees could be split, but to ensure every last possible
> > block could be allocated from the AG without causing the AG
> > selection algorithms to select and modify AGs that could not have
> > their AGFL fully fixed up to allocate the blocks that the caller
> > needed when near ENOSPC...
> 
> Hmmm, thanks for the context!  What do you think about this revised
> comment?
> 
> /*
>  * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
>  * guarantee that we can refill the AGFL prior to allocating space in a
>  * nearly full AG.  Although the the space described in the free space
>  * btrees, the freesp btrees, and the blocks owned by the agfl are all
>  * added to the ondisk fdblocks, it's a mistake to let the ondisk free
>  * space in the AG drop so low that the free space btrees cannot refill
>  * an empty AGFL up to the minimum level.  Rather than grind through
>  * empty AGs until the fs goes down, we subtract this many AG blocks
>  * from the incore fdblocks to ENOSPC early.

s/ENOSPC early./ensure user allocation does not overcommit the space
the filesystem needs for the AGFLs./

And it's all good by me.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux