Re: [PATCH 4/4] xfs: fix off-by-one error in xfs_btree_space_to_height

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 19, 2022 at 04:05:19PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> Lately I've been stress-testing extreme-sized rmap btrees by using the
> (new) xfs_db bmap_inflate command to clone bmbt mappings billions of
> times and then using xfs_repair to build new rmap and refcount btrees.
> This of course is /much/ faster than actually FICLONEing a file billions
> of times.
> 
> Unfortunately, xfs_repair fails in xfs_btree_bload_compute_geometry with
> EOVERFLOW, which indicates that xfs_mount.m_rmap_maxlevels is not
> sufficiently large for the test scenario.  For a 1TB filesystem (~67
> million AG blocks, 4 AGs) the btheight command reports:
> 
> $ xfs_db -c 'btheight -n 4400801200 -w min rmapbt' /dev/sda
> rmapbt: worst case per 4096-byte block: 84 records (leaf) / 45 keyptrs (node)
> level 0: 4400801200 records, 52390491 blocks
> level 1: 52390491 records, 1164234 blocks
> level 2: 1164234 records, 25872 blocks
> level 3: 25872 records, 575 blocks
> level 4: 575 records, 13 blocks
> level 5: 13 records, 1 block
> 6 levels, 53581186 blocks total
> 
> The AG is sufficiently large to build this rmap btree.  Unfortunately,
> m_rmap_maxlevels is 5.  Augmenting the loop in the space->height
> function to report height, node blocks, and blocks remaining produces
> this:
> 
> ht 1 node_blocks 45 blockleft 67108863
> ht 2 node_blocks 2025 blockleft 67108818
> ht 3 node_blocks 91125 blockleft 67106793
> ht 4 node_blocks 4100625 blockleft 67015668
> final height: 5
> 
> The goal of this function is to compute the maximum height btree that
> can be stored in the given number of ondisk fsblocks.  Starting with the
> top level of the tree, each iteration through the loop adds the fanout
> factor of the next level down until we run out of blocks.  IOWs, maximum
> height is achieved by using the smallest fanout factor that can apply
> to that level.
> 
> However, the loop setup is not correct.  Top level btree blocks are
> allowed to contain fewer than minrecs items, so the computation is
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Ah, that's the critical piece of information I was looking for. I
couldn't work out from the code change below what was wrong with
limits[1]. So....

> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 4c16c8c31fcb..8d11d3f5e529 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -4666,7 +4666,11 @@ xfs_btree_space_to_height(
>  	const unsigned int	*limits,
>  	unsigned long long	leaf_blocks)
>  {
> -	unsigned long long	node_blocks = limits[1];
> +	/*
> +	 * The root btree block can have a fanout between 2 and maxrecs because
> +	 * the tree might not be big enough to fill it.
> +	 */

Can you change this comment to say something like:

	/*
	 * The root btree block can have less than minrecs pointers
	 * in it because the tree might not be big enough to require
	 * that amount of fanout. Hence it has a minimum size of
	 * 2 pointers, not limits[1].
	 */

Otherwise it looks good.

Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>

> +	unsigned long long	node_blocks = 2;
>  	unsigned long long	blocks_left = leaf_blocks - 1;
>  	unsigned int		height = 1;

For future consideration, we don't use maxrecs in this calculation
at all - should we just pass minrecs into the function rather than
an array of limits?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux