Re: [PATCH 01/14] xfs: repair the AGF and AGFL

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 6 Jun 2018 14:06:24 +1000

On Tue, Jun 05, 2018 at 04:18:56PM -0700, Darrick J. Wong wrote:
> On Mon, Jun 04, 2018 at 11:52:55AM +1000, Dave Chinner wrote:
> > On Wed, May 30, 2018 at 12:30:45PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > 
> > > Regenerate the AGF and AGFL from the rmap data.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > [...]
> > 
> > > +/* Repair the AGF. */
> > > +int
> > > +xfs_repair_agf(
> > > +	struct xfs_scrub_context	*sc)
> > > +{
> > > +	struct xfs_repair_find_ag_btree	fab[] = {
> > > +		{
> > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > +			.magic = XFS_ABTB_CRC_MAGIC,
> > > +		},
> > > +		{
> > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > +			.magic = XFS_ABTC_CRC_MAGIC,
> > > +		},
> > > +		{
> > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > +			.buf_ops = &xfs_rmapbt_buf_ops,
> > > +			.magic = XFS_RMAP_CRC_MAGIC,
> > > +		},
> > > +		{
> > > +			.rmap_owner = XFS_RMAP_OWN_REFC,
> > > +			.buf_ops = &xfs_refcountbt_buf_ops,
> > > +			.magic = XFS_REFC_CRC_MAGIC,
> > > +		},
> > > +		{
> > > +			.buf_ops = NULL,
> > > +		},
> > > +	};
> > > +	struct xfs_repair_agf_allocbt	raa;
> > > +	struct xfs_agf			old_agf;
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	struct xfs_buf			*agf_bp;
> > > +	struct xfs_buf			*agfl_bp;
> > > +	struct xfs_agf			*agf;
> > > +	struct xfs_btree_cur		*cur = NULL;
> > > +	struct xfs_perag		*pag;
> > > +	xfs_agblock_t			blocks;
> > > +	xfs_agblock_t			freesp_blocks;
> > > +	int64_t				delta_fdblocks = 0;
> > > +	int				error;
> > > +
> > > +	/* We require the rmapbt to rebuild anything. */
> > > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > +		return -EOPNOTSUPP;
> > > +
> > > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > > +	pag = sc->sa.pag;
> > > +	memset(&raa, 0, sizeof(raa));
> > > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
> > > +			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
> > > +	if (error)
> > > +		return error;
> > > +	agf_bp->b_ops = &xfs_agf_buf_ops;
> > > +
> > > +	/*
> > > +	 * Load the AGFL so that we can screen out OWN_AG blocks that
> > > +	 * are on the AGFL now; these blocks might have once been part
> > > +	 * of the bno/cnt/rmap btrees but are not now.
> > > +	 */
> > > +	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
> > > +	if (error)
> > > +		return error;
> > > +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
> > > +			xfs_repair_agf_check_agfl_block, sc);
> > > +	if (error)
> > > +		return error;
> > 
> > THis is a bit of a chicken/egg situation, isn't it? We haven't
> > repaired the AGFL yet, so how do we know what is valid here?
> 
> Yep.  The AGF is corrupt, so we have to trust the AGFL contents because
> we can't do any serious cross-referencing with any of the btrees rooted
> in the AGF.  If the AGFL contents are obviously bad then we'll bail out.

Can you add that as a comment here?

> > Can we factor this function allow rebuild operation lines?
> 
> Yes...
> 
> > That will help document all the different pieces it is putting
> > together. E.g move the AGF header init to before
> > xfs_repair_find_ag_btree_roots(), and then pass it into
> > xfs_repair_agf_rebuild_roots() which contains the above fab specific
> > code.
> 
> ...however, that's the second (and admittedly not well documented)
> second chicken-egg -- we find the agf btree roots by probing the rmapbt,
> which is rooted in the agf.  So xfs_repair_find_ag_btree_roots has to be
> fed the old agf_bp buffer, and if that blows up then we bail out without
> changing anything.

Same again - factoring and adding comments to explain things like
this will make it much easier to understand.

> > > +/* Record all freespace information. */
> > > +STATIC int
> > > +xfs_repair_agfl_rmap_fn(
> > > +	struct xfs_btree_cur		*cur,
> > > +	struct xfs_rmap_irec		*rec,
> > > +	void				*priv)
> > > +{
> > > +	struct xfs_repair_agfl		*ra = priv;
> > > +	struct xfs_buf			*bp;
> > > +	xfs_fsblock_t			fsb;
> > > +	int				i;
> > > +	int				error = 0;
> > > +
> > > +	if (xfs_scrub_should_terminate(ra->sc, &error))
> > > +		return error;
> > > +
> > > +	/* Record all the OWN_AG blocks... */
> > > +	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
> > > +		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
> > > +				rec->rm_startblock);
> > > +		error = xfs_repair_collect_btree_extent(ra->sc,
> > > +				&ra->freesp_list, fsb, rec->rm_blockcount);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > > +
> > > +	/* ...and all the rmapbt blocks... */
> > > +	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
> > 
> > What is the significance of "cur->bc_ptrs[i] == 1"?
> > 
> > This loop looks like it is walking the btree path to this leaf, but
> > bc_ptrs[] will only have a "1" in it if we are at the left-most edge
> > of the tree, right? so what about all the other btree blocks?
> 
> Close.  We're walking up the tree from the leaf towards the root.  For
> each level, we assume that if bc_ptrs[level] == 1, then this is the
> first time we've seen the block at that level, so we remember that we
> saw this rmapbt block.  bc_ptrs is the offset within a block, not the
> offset for the entire level.
> 
> So if our rmapbt tree is:
> 
>    4
>  / | \
> 1  2  3
> 
> Pretend for this example that each leaf block has 100 rmap records.  For
> the first rmap record, we'll observe that bc_ptrs[0] == 1, so we record
> that we saw block 1.  Then we observe that bc_ptrs[1] == 1, so we record
> block 4.  agmeta_list is [1, 4].
> 
> For the second rmap record, we see that bc_ptrs[0] == 2, so we exit the
> loop.  agmeta_list remains [1, 4].
> 
> For the 101st rmap record, we've moved onto leaf block 2.  Now
> bc_ptrs[0] == 1 again, so we record that we saw block 2.  We see that
> bc_ptrs[1] == 2, so we exit the loop.  agmeta_list = [1, 4, 2].
> 
> For the 102nd rmap, bc_ptrs[0] == 2, so we exit.
> 
> For the 201st rmap record, we've moved on to leaf block 3.  bc_ptrs[0]
> == 1, so we add 3 to agmeta_list.  [1, 4, 2, 3].

And that is crying out for either an iterator macro or a helper
function with that explanation above it :P

> > > +
> > > +	/* We require the rmapbt to rebuild anything. */
> > > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > +		return -EOPNOTSUPP;
> > > +
> > > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > > +	xfs_repair_init_extent_list(&ra.freesp_list);
> > > +	xfs_repair_init_extent_list(&ra.agmeta_list);
> > > +	ra.sc = sc;
> > > +
> > > +	error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
> > > +	if (error)
> > > +		return error;
> > > +	if (!agf_bp)
> > > +		return -ENOMEM;
> > > +
> > > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)),
> > > +			XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL);
> > > +	if (error)
> > > +		return error;
> > > +	agfl_bp->b_ops = &xfs_agfl_buf_ops;
> > 
> > Be nice to have a __xfs_alloc_read_agfl() function that didn't set
> > the ops, and have this and xfs_alloc_read_agfl() both call it.
> 
> Huh?  xfs_alloc_read_agfl always reads the agfl buffer with
> &xfs_agfl_buf_ops, why would we want to call it without the verifier?

You wouldn't:

xfs_alloc_read_agfl()
{
	return __xfs_alloc_read_agfl(..., &xfs_agfl_buf_ops);
}

And then the above simply becomes

	error = __xfs_alloc_read_agfl(..., NULL);
	if (error)
		return error;
	agfl_bp->b_ops = &xfs_agfl_buf_ops;

I'm more concerned about open coding of things we have currently
centralised in helpers, and trying to see if there's ways to keep
the functions centralised.

Don't worry about it - it was really just a comment about "it would
be nice to have...".

> It's only scrub that gets to do screwy things like read buffers with no
> verifier.  libxfs functions should never do that.

I didn't know (well, I don't recall that) we have this rule. Can you
point me at the discussion so I can read up on it?  IMO libxfs is
for centralising common operations, not for enforcing boundaries or
rules on how we access objects.

> > > +int
> > > +xfs_repair_mod_fdblocks(
> > > +	struct xfs_scrub_context	*sc,
> > > +	int64_t				delta_fdblocks)
> > > +{
> > > +	int				error;
> > > +
> > > +	if (delta_fdblocks == 0)
> > > +		return 0;
> > > +
> > > +	if (delta_fdblocks < 0) {
> > > +		error = xfs_trans_reserve_more(sc->tp, -delta_fdblocks, 0);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > > +
> > > +	xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks);
> > 
> > This seems a little hacky - it's working around a transaction
> > reservation overflow warning, right?
> 
> More than that -- we're trying to avoid the situation where the incore
> free block counter goes negative.

Which will only happen if we overflow the transaction reservation,
yes?

> Things go south pretty quickly when
> that happens because transaction reservations succeed when there's not
> enough free space to accomodate them.  We'd rather error out to
> userspace and have the admin unmount and xfs_repair than risk letting
> the fs really blow up.

Sure, but I really don't like retrospective modification of
transaction reservations.  The repair code is already supposed to
have a reservation that is big enough to rebuild the AG trees, so
why should we need to reserve more space while rebuilding the AG
trees?

> Note that this function has to be called before repair dirties anything
> in the repair transaction so we're still at a place where we could back
> out with no harm done.

Still doesn't explain to me what the problem is that this code works
around. And because I don't understand why it is necessary, this just
seems like a hack....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html