Re: [RFC PATCH v3] xfs_repair: fix rebuilding btree block less than minrecs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Darrick,

On Tue, Jun 16, 2020 at 09:11:43AM -0700, Darrick J. Wong wrote:
> On Wed, Jun 10, 2020 at 01:26:24PM +0800, Gao Xiang wrote:
> > In production, we found that sometimes xfs_repair phase 5
> > rebuilds freespace node block with pointers less than minrecs
> > and if we trigger xfs_repair again it would report such
> > the following message:
> > 
> > bad btree nrecs (39, min=40, max=80) in btbno block 0/7882
> > 
> > The background is that xfs_repair starts to rebuild AGFL
> > after the freespace btree is settled in phase 5 so we may
> > need to leave necessary room in advance for each btree
> > leaves in order to avoid freespace btree split and then
> > result in AGFL rebuild fails. The old mathematics uses
> > ceil(num_extents / maxrecs) to decide the number of node
> > blocks. That would be fine without leaving extra space
> > since minrecs = maxrecs / 2 but if some slack was decreased
> > from maxrecs, the result would be larger than what is
> > expected and cause num_recs_pb less than minrecs, i.e:
> > 
> > num_extents = 79, adj_maxrecs = 80 - 2 (slack) = 78
> > 
> > so we'd get
> > 
> > num_blocks = ceil(79 / 78) = 2,
> > num_recs_pb = 79 / 2 = 39, which is less than
> > minrecs = 80 / 2 = 40
> > 
> > OTOH, btree bulk loading code behaves in a different way.
> > As in xfs_btree_bload_level_geometry it wrote
> > 
> > num_blocks = floor(num_extents / maxrecs)
> > 
> > which will never go below minrecs. And when it goes above
> > maxrecs, just increment num_blocks and recalculate so we
> > can get the reasonable results.
> > 
> > Later, btree bulk loader will replace the current repair code.
> > But we may still want to look for a backportable solution
> > for stable versions. Hence, keep the same logic to avoid
> > the freespace as well as rmap btree minrecs underflow for now.
> > 
> > Cc: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx>
> > Cc: Dave Chinner <dchinner@xxxxxxxxxx>
> > Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>
> > Fixes: 9851fd79bfb1 ("repair: AGFL rebuild fails if btree split required")
> > Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxx>
> > ---
> > changes since v2:
> >  still some minor styling fix (ASSERT, args)..
> > 
> > changes since v1:
> >  - fix indentation, typedefs, etc code styling problem
> >    pointed out by Darrick;
> > 
> >  - adapt init_rmapbt_cursor to the new algorithm since
> >    it's similar pointed out by Darrick; thus the function
> >    name remains the origin compute_level_geometry...
> >    and hence, adjust the subject a bit as well.
> > 
> >  repair/phase5.c | 152 ++++++++++++++++++++----------------------------
> >  1 file changed, 63 insertions(+), 89 deletions(-)
> > 
> > diff --git a/repair/phase5.c b/repair/phase5.c
> > index abae8a08..d30d32b2 100644
> > --- a/repair/phase5.c
> > +++ b/repair/phase5.c
> > @@ -348,11 +348,32 @@ finish_cursor(bt_status_t *curs)
> >   * failure at runtime. Hence leave a couple of records slack space in
> >   * each block to allow immediate modification of the tree without
> >   * requiring splits to be done.
> > - *
> > - * XXX(hch): any reason we don't just look at mp->m_alloc_mxr?
> >   */
> > -#define XR_ALLOC_BLOCK_MAXRECS(mp, level) \
> > -	(libxfs_allocbt_maxrecs((mp), (mp)->m_sb.sb_blocksize, (level) == 0) - 2)
> > +static void
> > +compute_level_geometry(
> > +	struct xfs_mount	*mp,
> > +	struct bt_stat_level	*lptr,
> > +	uint64_t		nr_this_level,
> 
> Probably didn't need a u64 here, but <shrug> that's probably just my
> kernel-coloured glasses. :)

Yeah, I personally tend to use kernel type u64 in my own projects, but I'm not
sure what's preferred here...

> 
> > +	int			slack,
> > +	bool			leaf)
> > +{
> > +	unsigned int		maxrecs = mp->m_alloc_mxr[!leaf];
> > +	unsigned int		desired_npb;
> > +
> > +	desired_npb = max(mp->m_alloc_mnr[!leaf], maxrecs - slack);
> > +	lptr->num_recs_tot = nr_this_level;
> > +	lptr->num_blocks = max(1ULL, nr_this_level / desired_npb);
> > +
> > +	lptr->num_recs_pb = nr_this_level / lptr->num_blocks;
> > +	lptr->modulo = nr_this_level % lptr->num_blocks;
> > +	if (lptr->num_recs_pb > maxrecs ||
> > +	    (lptr->num_recs_pb == maxrecs && lptr->modulo)) {
> > +		lptr->num_blocks++;
> > +
> > +		lptr->num_recs_pb = nr_this_level / lptr->num_blocks;
> > +		lptr->modulo = nr_this_level % lptr->num_blocks;
> > +	}
> 
> Seems to be more or less the same solution that I (half unknowingly)
> coded into the btree bulkload geometry calculator, so:
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>

Thanks for the review... And I checked all xfs-repair related fstests
and it seems no noticable strange...

> 
> (Still working on adapting the new phase5 code to try to fill the AGFL
> as part of rebuilding the free space btrees, fwiw.)

Good news... although I still have limited knowledge to the whole XFS
(now stuggling in reading XFS logging system...)

Thanks,
Gao Xiang

> 
> --D
> 




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux