Re: [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 24, 2022 at 07:48:56AM +1100, Dave Chinner wrote:
> On Sun, Mar 20, 2022 at 09:43:38AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > The comment for xfs_alloc_set_aside indicates that we want to set aside
> > enough space to handle a bmap btree split.  The code, unfortunately,
> > hardcodes this to 4.
> > 
> > This is incorrect, since file bmap btrees can be taller than that:
> > 
> > xfs_db> btheight bmapbt -n 4294967295 -b 512
> > bmapbt: worst case per 512-byte block: 13 records (leaf) / 13 keyptrs (node)
> > level 0: 4294967295 records, 330382100 blocks
> > level 1: 330382100 records, 25414008 blocks
> > level 2: 25414008 records, 1954924 blocks
> > level 3: 1954924 records, 150379 blocks
> > level 4: 150379 records, 11568 blocks
> > level 5: 11568 records, 890 blocks
> > level 6: 890 records, 69 blocks
> > level 7: 69 records, 6 blocks
> > level 8: 6 records, 1 block
> > 9 levels, 357913945 blocks total
> > 
> > Or, for V5 filesystems:
> > 
> > xfs_db> btheight bmapbt -n 4294967295 -b 1024
> > bmapbt: worst case per 1024-byte block: 29 records (leaf) / 29 keyptrs (node)
> > level 0: 4294967295 records, 148102321 blocks
> > level 1: 148102321 records, 5106977 blocks
> > level 2: 5106977 records, 176103 blocks
> > level 3: 176103 records, 6073 blocks
> > level 4: 6073 records, 210 blocks
> > level 5: 210 records, 8 blocks
> > level 6: 8 records, 1 block
> > 7 levels, 153391693 blocks total
> > 
> > Fix this by using the actual bmap btree maxlevel value for the
> > set-aside.  We subtract one because the root is always in the inode and
> > hence never splits.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c |    7 +++++--
> >  fs/xfs/libxfs/xfs_sb.c    |    2 --
> >  fs/xfs/xfs_mount.c        |    7 +++++++
> >  3 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index b0678e96ce61..747b3e45303f 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -107,13 +107,16 @@ xfs_prealloc_blocks(
> >   * aside a few blocks which will not be reserved in delayed allocation.
> >   *
> >   * For each AG, we need to reserve enough blocks to replenish a totally empty
> > - * AGFL and 4 more to handle a potential split of the file's bmap btree.
> > + * AGFL and enough to handle a potential split of a file's bmap btree.
> >   */
> >  unsigned int
> >  xfs_alloc_set_aside(
> >  	struct xfs_mount	*mp)
> >  {
> > -	return mp->m_sb.sb_agcount * (XFS_ALLOCBT_AGFL_RESERVE + 4);
> > +	unsigned int		bmbt_splits;
> > +
> > +	bmbt_splits = max(mp->m_bm_maxlevels[0], mp->m_bm_maxlevels[1]) - 1;
> > +	return mp->m_sb.sb_agcount * (XFS_ALLOCBT_AGFL_RESERVE + bmbt_splits);
> >  }
> 
> So right now I'm trying to understand why this global space set
> aside ever needed to take into account the space used by a single
> BMBT split. ISTR it was done back in 2006 because the ag selection
> code, alloc args and/or xfs_alloc_space_available() didn't take into
> account the BMBT space via args->minleft correctly to ensure that
> the AGF we select had enough space in it for both the data extent
> and the followup BMBT split. Hence the original SET ASIDE (which
> wasn't per AG) was just 8 blocks - 4 for the AGFL, 4 for the BMBT.
> 
> The transaction reservation takes into account the space needed by
> BMBT splits so we don't over-commit global space on user allocation
> anymore, args->minleft takes it into account so we don't overcommit
> AG space on extent allocation, and we have the reserved block pool
> to handle situations were delalloc extents are fragmented more than
> the initial indirect block reservation that is taken with the
> delalloc extent reservation.
> 
> So where/why is this needed anymore?

Beats me.  I started by reading the comment and thought "Gee, that's not
true of bmbts!" :(

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux