Re: [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 23, 2022 at 10:26:14PM -0700, Darrick J. Wong wrote:
> On Thu, Mar 24, 2022 at 07:48:56AM +1100, Dave Chinner wrote:
> > On Sun, Mar 20, 2022 at 09:43:38AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > 
> > > The comment for xfs_alloc_set_aside indicates that we want to set aside
> > > enough space to handle a bmap btree split.  The code, unfortunately,
> > > hardcodes this to 4.
> > > 
> > > This is incorrect, since file bmap btrees can be taller than that:
> > > 
> > > xfs_db> btheight bmapbt -n 4294967295 -b 512
> > > bmapbt: worst case per 512-byte block: 13 records (leaf) / 13 keyptrs (node)
> > > level 0: 4294967295 records, 330382100 blocks
> > > level 1: 330382100 records, 25414008 blocks
> > > level 2: 25414008 records, 1954924 blocks
> > > level 3: 1954924 records, 150379 blocks
> > > level 4: 150379 records, 11568 blocks
> > > level 5: 11568 records, 890 blocks
> > > level 6: 890 records, 69 blocks
> > > level 7: 69 records, 6 blocks
> > > level 8: 6 records, 1 block
> > > 9 levels, 357913945 blocks total
> > > 
> > > Or, for V5 filesystems:
> > > 
> > > xfs_db> btheight bmapbt -n 4294967295 -b 1024
> > > bmapbt: worst case per 1024-byte block: 29 records (leaf) / 29 keyptrs (node)
> > > level 0: 4294967295 records, 148102321 blocks
> > > level 1: 148102321 records, 5106977 blocks
> > > level 2: 5106977 records, 176103 blocks
> > > level 3: 176103 records, 6073 blocks
> > > level 4: 6073 records, 210 blocks
> > > level 5: 210 records, 8 blocks
> > > level 6: 8 records, 1 block
> > > 7 levels, 153391693 blocks total
> > > 
> > > Fix this by using the actual bmap btree maxlevel value for the
> > > set-aside.  We subtract one because the root is always in the inode and
> > > hence never splits.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/libxfs/xfs_alloc.c |    7 +++++--
> > >  fs/xfs/libxfs/xfs_sb.c    |    2 --
> > >  fs/xfs/xfs_mount.c        |    7 +++++++
> > >  3 files changed, 12 insertions(+), 4 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index b0678e96ce61..747b3e45303f 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -107,13 +107,16 @@ xfs_prealloc_blocks(
> > >   * aside a few blocks which will not be reserved in delayed allocation.
> > >   *
> > >   * For each AG, we need to reserve enough blocks to replenish a totally empty
> > > - * AGFL and 4 more to handle a potential split of the file's bmap btree.
> > > + * AGFL and enough to handle a potential split of a file's bmap btree.
> > >   */
> > >  unsigned int
> > >  xfs_alloc_set_aside(
> > >  	struct xfs_mount	*mp)
> > >  {
> > > -	return mp->m_sb.sb_agcount * (XFS_ALLOCBT_AGFL_RESERVE + 4);
> > > +	unsigned int		bmbt_splits;
> > > +
> > > +	bmbt_splits = max(mp->m_bm_maxlevels[0], mp->m_bm_maxlevels[1]) - 1;
> > > +	return mp->m_sb.sb_agcount * (XFS_ALLOCBT_AGFL_RESERVE + bmbt_splits);
> > >  }
> > 
> > So right now I'm trying to understand why this global space set
> > aside ever needed to take into account the space used by a single
> > BMBT split. ISTR it was done back in 2006 because the ag selection
> > code, alloc args and/or xfs_alloc_space_available() didn't take into
> > account the BMBT space via args->minleft correctly to ensure that
> > the AGF we select had enough space in it for both the data extent
> > and the followup BMBT split. Hence the original SET ASIDE (which
> > wasn't per AG) was just 8 blocks - 4 for the AGFL, 4 for the BMBT.
> > 
> > The transaction reservation takes into account the space needed by
> > BMBT splits so we don't over-commit global space on user allocation
> > anymore, args->minleft takes it into account so we don't overcommit
> > AG space on extent allocation, and we have the reserved block pool
> > to handle situations were delalloc extents are fragmented more than
> > the initial indirect block reservation that is taken with the
> > delalloc extent reservation.
> > 
> > So where/why is this needed anymore?
> 
> Beats me.  I started by reading the comment and thought "Gee, that's not
> true of bmbts!" :(

Yeah. Time and knowledge are showing through here. I've been running
ENOSPC tests with the BMBT reservation part removed now for several
hours and I've haven't seen any issues as a result removing it. I'll
keep it running though to see if anything falls out, but my initial
impression is that we don't obviously need it anymore.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux