Re: [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 20, 2022 at 09:43:32AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> Currently, we use this undocumented macro to encode the minimum number
> of blocks needed to replenish a completely empty AGFL when an AG is
> nearly full.  This has lead to confusion on the part of the maintainers,
> so let's document what the value actually means, and move it to
> xfs_alloc.c since it's not used outside of that module.

Code change looks fine, but....

> 
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
>  fs/xfs/libxfs/xfs_alloc.c |   23 ++++++++++++++++++-----
>  fs/xfs/libxfs/xfs_alloc.h |    1 -
>  2 files changed, 18 insertions(+), 6 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 353e53b892e6..b0678e96ce61 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -82,6 +82,19 @@ xfs_prealloc_blocks(
>  }
>  
>  /*
> + * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
> + * guarantee that we can refill the AGFL prior to allocating space in a nearly
> + * full AG.  We require two blocks per free space btree because free space
> + * btrees shrink to a single block as the AG fills up, and any allocation can
> + * cause a btree split.  The rmap btree uses a per-AG reservation to withhold
> + * space from xfs_mod_fdblocks, so we do not account for that here.
> + */
> +#define XFS_ALLOCBT_AGFL_RESERVE	4

.... that comment is not correct.  this number had nothing to do
with btree split reservations and is all about preventing
oversubscription of the AG when the free space trees are completely
empty.  By the time there is enough free space records in the AG for
the bno/cnt trees to be at risk of a single level root split
(several hundred free extent records), there is enough free space to
fully allocate the 4 blocks that the AGFL needs for that split.

That is, the set aside was designed to prevent AG selection in
xfs_alloc_vextent() having xfs_alloc_fixup_freelist() modifying an
AG to fix up the AGFL and then fail to fully fill the AGFL because,
say, there were only 2 blocks free in the AG and the AGFL needed 3.
Then we try all other AGs and get ENOSPC from all of them, and we
end up cancelling a dirty transaction and shutting down instead of
propagating ENOSPC up to the user.

This "not enough blocks to populate the AGFL" problem arose because
we can allocate extents directly from the AGFL if the free space
btree is empty, resulting in depletion of the AGFL and no free space
to re-populate it. Freeing a block will then go back into the btree,
and so the next allocation attempt might need 2 blocks for the AGFL,
have one block in the free space tree, and then we fail to fill
the AGFL completely because we still need one block for the AGFL and
there's no space available anymore. If all other AGs are like this
or ENOSPC, then kaboom.

IOWs, I originally added this per-ag set aside so that when the AG
was almost completely empty and we were down to allocating the last
blocks from the AG, users couldn't oversubscribe global free space by
consuming the blocks the AGs required to fill the AGFLs to allow the
last blocks that users could allocate to be allocated safely.

Hence the set aside of 4 blocks per AG was not to ensure the
freespace trees could be split, but to ensure every last possible
block could be allocated from the AG without causing the AG
selection algorithms to select and modify AGs that could not have
their AGFL fully fixed up to allocate the blocks that the caller
needed when near ENOSPC...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux