On Tue, Jun 05, 2018 at 04:18:56PM -0700, Darrick J. Wong wrote: > On Mon, Jun 04, 2018 at 11:52:55AM +1000, Dave Chinner wrote: > > On Wed, May 30, 2018 at 12:30:45PM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > > Regenerate the AGF and AGFL from the rmap data. > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > [...] > > > > > +/* Repair the AGF. */ > > > +int > > > +xfs_repair_agf( > > > + struct xfs_scrub_context *sc) > > > +{ > > > + struct xfs_repair_find_ag_btree fab[] = { > > > + { > > > + .rmap_owner = XFS_RMAP_OWN_AG, > > > + .buf_ops = &xfs_allocbt_buf_ops, > > > + .magic = XFS_ABTB_CRC_MAGIC, > > > + }, > > > + { > > > + .rmap_owner = XFS_RMAP_OWN_AG, > > > + .buf_ops = &xfs_allocbt_buf_ops, > > > + .magic = XFS_ABTC_CRC_MAGIC, > > > + }, > > > + { > > > + .rmap_owner = XFS_RMAP_OWN_AG, > > > + .buf_ops = &xfs_rmapbt_buf_ops, > > > + .magic = XFS_RMAP_CRC_MAGIC, > > > + }, > > > + { > > > + .rmap_owner = XFS_RMAP_OWN_REFC, > > > + .buf_ops = &xfs_refcountbt_buf_ops, > > > + .magic = XFS_REFC_CRC_MAGIC, > > > + }, > > > + { > > > + .buf_ops = NULL, > > > + }, > > > + }; > > > + struct xfs_repair_agf_allocbt raa; > > > + struct xfs_agf old_agf; > > > + struct xfs_mount *mp = sc->mp; > > > + struct xfs_buf *agf_bp; > > > + struct xfs_buf *agfl_bp; > > > + struct xfs_agf *agf; > > > + struct xfs_btree_cur *cur = NULL; > > > + struct xfs_perag *pag; > > > + xfs_agblock_t blocks; > > > + xfs_agblock_t freesp_blocks; > > > + int64_t delta_fdblocks = 0; > > > + int error; > > > + > > > + /* We require the rmapbt to rebuild anything. */ > > > + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) > > > + return -EOPNOTSUPP; > > > + > > > + xfs_scrub_perag_get(sc->mp, &sc->sa); > > > + pag = sc->sa.pag; > > > + memset(&raa, 0, sizeof(raa)); > > > + error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp, > > > + XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)), > > > + XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL); > > > + if (error) > > > + return error; > > > + agf_bp->b_ops = &xfs_agf_buf_ops; > > > + > > > + /* > > > + * Load the AGFL so that we can screen out OWN_AG blocks that > > > + * are on the AGFL now; these blocks might have once been part > > > + * of the bno/cnt/rmap btrees but are not now. > > > + */ > > > + error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp); > > > + if (error) > > > + return error; > > > + error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp, > > > + xfs_repair_agf_check_agfl_block, sc); > > > + if (error) > > > + return error; > > > > THis is a bit of a chicken/egg situation, isn't it? We haven't > > repaired the AGFL yet, so how do we know what is valid here? > > Yep. The AGF is corrupt, so we have to trust the AGFL contents because > we can't do any serious cross-referencing with any of the btrees rooted > in the AGF. If the AGFL contents are obviously bad then we'll bail out. Can you add that as a comment here? > > Can we factor this function allow rebuild operation lines? > > Yes... > > > That will help document all the different pieces it is putting > > together. E.g move the AGF header init to before > > xfs_repair_find_ag_btree_roots(), and then pass it into > > xfs_repair_agf_rebuild_roots() which contains the above fab specific > > code. > > ...however, that's the second (and admittedly not well documented) > second chicken-egg -- we find the agf btree roots by probing the rmapbt, > which is rooted in the agf. So xfs_repair_find_ag_btree_roots has to be > fed the old agf_bp buffer, and if that blows up then we bail out without > changing anything. Same again - factoring and adding comments to explain things like this will make it much easier to understand. > > > +/* Record all freespace information. */ > > > +STATIC int > > > +xfs_repair_agfl_rmap_fn( > > > + struct xfs_btree_cur *cur, > > > + struct xfs_rmap_irec *rec, > > > + void *priv) > > > +{ > > > + struct xfs_repair_agfl *ra = priv; > > > + struct xfs_buf *bp; > > > + xfs_fsblock_t fsb; > > > + int i; > > > + int error = 0; > > > + > > > + if (xfs_scrub_should_terminate(ra->sc, &error)) > > > + return error; > > > + > > > + /* Record all the OWN_AG blocks... */ > > > + if (rec->rm_owner == XFS_RMAP_OWN_AG) { > > > + fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, > > > + rec->rm_startblock); > > > + error = xfs_repair_collect_btree_extent(ra->sc, > > > + &ra->freesp_list, fsb, rec->rm_blockcount); > > > + if (error) > > > + return error; > > > + } > > > + > > > + /* ...and all the rmapbt blocks... */ > > > + for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) { > > > > What is the significance of "cur->bc_ptrs[i] == 1"? > > > > This loop looks like it is walking the btree path to this leaf, but > > bc_ptrs[] will only have a "1" in it if we are at the left-most edge > > of the tree, right? so what about all the other btree blocks? > > Close. We're walking up the tree from the leaf towards the root. For > each level, we assume that if bc_ptrs[level] == 1, then this is the > first time we've seen the block at that level, so we remember that we > saw this rmapbt block. bc_ptrs is the offset within a block, not the > offset for the entire level. > > So if our rmapbt tree is: > > 4 > / | \ > 1 2 3 > > Pretend for this example that each leaf block has 100 rmap records. For > the first rmap record, we'll observe that bc_ptrs[0] == 1, so we record > that we saw block 1. Then we observe that bc_ptrs[1] == 1, so we record > block 4. agmeta_list is [1, 4]. > > For the second rmap record, we see that bc_ptrs[0] == 2, so we exit the > loop. agmeta_list remains [1, 4]. > > For the 101st rmap record, we've moved onto leaf block 2. Now > bc_ptrs[0] == 1 again, so we record that we saw block 2. We see that > bc_ptrs[1] == 2, so we exit the loop. agmeta_list = [1, 4, 2]. > > For the 102nd rmap, bc_ptrs[0] == 2, so we exit. > > For the 201st rmap record, we've moved on to leaf block 3. bc_ptrs[0] > == 1, so we add 3 to agmeta_list. [1, 4, 2, 3]. And that is crying out for either an iterator macro or a helper function with that explanation above it :P > > > + > > > + /* We require the rmapbt to rebuild anything. */ > > > + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) > > > + return -EOPNOTSUPP; > > > + > > > + xfs_scrub_perag_get(sc->mp, &sc->sa); > > > + xfs_repair_init_extent_list(&ra.freesp_list); > > > + xfs_repair_init_extent_list(&ra.agmeta_list); > > > + ra.sc = sc; > > > + > > > + error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp); > > > + if (error) > > > + return error; > > > + if (!agf_bp) > > > + return -ENOMEM; > > > + > > > + error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp, > > > + XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)), > > > + XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL); > > > + if (error) > > > + return error; > > > + agfl_bp->b_ops = &xfs_agfl_buf_ops; > > > > Be nice to have a __xfs_alloc_read_agfl() function that didn't set > > the ops, and have this and xfs_alloc_read_agfl() both call it. > > Huh? xfs_alloc_read_agfl always reads the agfl buffer with > &xfs_agfl_buf_ops, why would we want to call it without the verifier? You wouldn't: xfs_alloc_read_agfl() { return __xfs_alloc_read_agfl(..., &xfs_agfl_buf_ops); } And then the above simply becomes error = __xfs_alloc_read_agfl(..., NULL); if (error) return error; agfl_bp->b_ops = &xfs_agfl_buf_ops; I'm more concerned about open coding of things we have currently centralised in helpers, and trying to see if there's ways to keep the functions centralised. Don't worry about it - it was really just a comment about "it would be nice to have...". > It's only scrub that gets to do screwy things like read buffers with no > verifier. libxfs functions should never do that. I didn't know (well, I don't recall that) we have this rule. Can you point me at the discussion so I can read up on it? IMO libxfs is for centralising common operations, not for enforcing boundaries or rules on how we access objects. > > > +int > > > +xfs_repair_mod_fdblocks( > > > + struct xfs_scrub_context *sc, > > > + int64_t delta_fdblocks) > > > +{ > > > + int error; > > > + > > > + if (delta_fdblocks == 0) > > > + return 0; > > > + > > > + if (delta_fdblocks < 0) { > > > + error = xfs_trans_reserve_more(sc->tp, -delta_fdblocks, 0); > > > + if (error) > > > + return error; > > > + } > > > + > > > + xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks); > > > > This seems a little hacky - it's working around a transaction > > reservation overflow warning, right? > > More than that -- we're trying to avoid the situation where the incore > free block counter goes negative. Which will only happen if we overflow the transaction reservation, yes? > Things go south pretty quickly when > that happens because transaction reservations succeed when there's not > enough free space to accomodate them. We'd rather error out to > userspace and have the admin unmount and xfs_repair than risk letting > the fs really blow up. Sure, but I really don't like retrospective modification of transaction reservations. The repair code is already supposed to have a reservation that is big enough to rebuild the AG trees, so why should we need to reserve more space while rebuilding the AG trees? > Note that this function has to be called before repair dirties anything > in the repair transaction so we're still at a place where we could back > out with no harm done. Still doesn't explain to me what the problem is that this code works around. And because I don't understand why it is necessary, this just seems like a hack.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html