On Tue, Jan 09, 2018 at 01:25:17PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > Cross reference the refcount data with the rmap data to check that the > number of rmaps for a given block match the refcount of that block, and > that CoW blocks (which are owned entirely by the refcountbt) are tracked > as well. > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > --- > v2: streamline scrubber arguments, remove stack allocated objects > --- > fs/xfs/scrub/refcount.c | 318 +++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 316 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c > index 700f8f1..df18e47 100644 > --- a/fs/xfs/scrub/refcount.c > +++ b/fs/xfs/scrub/refcount.c > @@ -50,6 +50,274 @@ xfs_scrub_setup_ag_refcountbt( > > /* Reference count btree scrubber. */ > > +/* > + * Confirming Reference Counts via Reverse Mappings > + * > + * We want to count the reverse mappings overlapping a refcount record > + * (bno, len, refcount), allowing for the possibility that some of the > + * overlap may come from smaller adjoining reverse mappings, while some > + * comes from single extents which overlap the range entirely. The > + * outer loop is as follows: > + * > + * 1. For all reverse mappings overlapping the refcount extent, > + * a. If a given rmap completely overlaps, mark it as seen. > + * b. Otherwise, record the fragment for later processing. > + * > + * Once we've seen all the rmaps, we know that for all blocks in the > + * refcount record we want to find $refcount owners and we've already > + * visited $seen extents that overlap all the blocks. Therefore, we > + * need to find ($refcount - $seen) owners for every block in the > + * extent; call that quantity $target_nr. Proceed as follows: > + * > + * 2. Pull the first $target_nr fragments from the list; all of them > + * should start at or before the start of the extent. > + * Call this subset of fragments the working set. > + * 3. Until there are no more unprocessed fragments, > + * a. Find the shortest fragments in the set and remove them. > + * b. Note the block number of the end of these fragments. > + * c. Pull the same number of fragments from the list. All of these > + * fragments should start at the block number recorded in the > + * previous step. > + * d. Put those fragments in the set. > + * 4. Check that there are $target_nr fragments remaining in the list, > + * and that they all end at or beyond the end of the refcount extent. > + * > + * If the refcount is correct, all the check conditions in the algorithm > + * should always hold true. If not, the refcount is incorrect. This needs a comment somewhere in here describing the order of the records on the fragment list. AFAICT, it's ordered by start bno, but I'm not 100% sure and it seems the code is dependent on strict ordering of records the frag list.... ..... > + } else { > + /* > + * This rmap covers only part of the refcount record, so > + * save the fragment for later processing. > + */ > + frag = kmem_alloc(sizeof(struct xfs_scrub_refcnt_frag), > + KM_MAYFAIL | KM_NOFS); > + if (!frag) > + return -ENOMEM; > + memcpy(&frag->rm, rec, sizeof(frag->rm)); > + list_add_tail(&frag->list, &refchk->fragments); I'm making the assumption here that we're seeing records in the order they are in the rmap tree and that it's in increase startblock order, hence the list is ordered that way.... > + } > + > + return 0; > +} > + > +/* > + * Given a bunch of rmap fragments, iterate through them, keeping > + * a running tally of the refcount. If this ever deviates from > + * what we expect (which is the refcountbt's refcount minus the > + * number of extents that totally covered the refcountbt extent), > + * we have a refcountbt error. > + */ > +STATIC void > +xfs_scrub_refcountbt_process_rmap_fragments( > + struct xfs_scrub_refcnt_check *refchk) > +{ > + struct list_head worklist; > + struct xfs_scrub_refcnt_frag *frag; > + struct xfs_scrub_refcnt_frag *n; > + xfs_agblock_t bno; > + xfs_agblock_t rbno; > + xfs_agblock_t next_rbno; > + xfs_nlink_t nr; > + xfs_nlink_t target_nr; > + > + target_nr = refchk->refcount - refchk->seen; > + if (target_nr == 0) > + return; > + > + /* > + * There are (refchk->rc.rc_refcount - refchk->nr refcount) > + * references we haven't found yet. Pull that many off the > + * fragment list and figure out where the smallest rmap ends > + * (and therefore the next rmap should start). All the rmaps > + * we pull off should start at or before the beginning of the > + * refcount record's range. > + */ > + INIT_LIST_HEAD(&worklist); > + rbno = NULLAGBLOCK; > + nr = 1; > + > + /* Find all the rmaps that start at or before the refc extent. */ > + list_for_each_entry_safe(frag, n, &refchk->fragments, list) { > + if (frag->rm.rm_startblock > refchk->bno) > + goto done; .... and is where the code implies lowest to highest startblock ordering on the frag list. > + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount; > + if (rbno > bno) > + rbno = bno; Can we put that check the other way around? we're looking for the shortest/smallest end block, so if (bno < rbno) rbno = bno; Makes a lot more sense to me. > + list_del(&frag->list); > + list_add_tail(&frag->list, &worklist); list_move_tail()? Ok, so we are moving fragments that start before the recount bno to the work list. > + if (nr == target_nr) > + break; > + nr++; > + } > + > + /* > + * We should have found exactly $target_nr rmap fragments starting > + * at or before the refcount extent. > + */ > + if (nr != target_nr) > + goto done; ok. so on error we clean up and free the frag list and work list.... > + > + while (!list_empty(&refchk->fragments)) { > + /* Discard any fragments ending at rbno. */ > + nr = 0; > + next_rbno = NULLAGBLOCK; > + list_for_each_entry_safe(frag, n, &worklist, list) { Ok, this needs to be clearer than it's walking the working set of fragments. I had to read this code several times before I worked out this is where the "working set" was being processed.... > + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount; > + if (bno != rbno) { > + if (next_rbno > bno) > + next_rbno = bno; Same comment here about being a next_rbno "smallest bno" variable. > + continue; > + } > + list_del(&frag->list); > + kmem_free(frag); > + nr++; > + } > + > + /* Empty list? We're done. */ > + if (list_empty(&refchk->fragments)) > + break; > + > + /* Try to add nr rmaps starting at rbno to the worklist. */ > + list_for_each_entry_safe(frag, n, &refchk->fragments, list) { > + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount; > + if (frag->rm.rm_startblock != rbno) > + goto done; definitely assuming the frag list is ordered here :P > + list_del(&frag->list); > + list_add_tail(&frag->list, &worklist); > + if (next_rbno > bno) > + next_rbno = bno; > + nr--; > + if (nr == 0) > + break; > + } Ok, so if we get here with nr > 0, then we must have emptied the fragment list onto the work list, right? At this point, the outer loop will terminate. Don't we need to run the worklist processing loop one last time? > + rbno = next_rbno; > + } ..... > +/* Use the rmap entries covering this extent to verify the refcount. */ > +STATIC void > +xfs_scrub_refcountbt_xref_rmap( > + struct xfs_scrub_context *sc, > + xfs_agblock_t bno, > + xfs_extlen_t len, > + xfs_nlink_t refcount) > +{ > + struct xfs_scrub_refcnt_check refchk = { > + .sc = sc, > + .bno = bno, > + .len = len, > + .refcount = refcount, > + .seen = 0, > + }; > + struct xfs_rmap_irec low; > + struct xfs_rmap_irec high; > + struct xfs_scrub_refcnt_frag *frag; > + struct xfs_scrub_refcnt_frag *n; > + int error; > + > + if (!sc->sa.rmap_cur) > + return; > + > + /* Cross-reference with the rmapbt to confirm the refcount. */ > + memset(&low, 0, sizeof(low)); > + low.rm_startblock = bno; > + memset(&high, 0xFF, sizeof(high)); > + high.rm_startblock = bno + len - 1; This range query init feels like a familiar pattern now. Helper function (separate patch)? .... > +/* Make sure we have as many refc blocks as the rmap says. */ > +STATIC void > +xfs_scrub_refcount_xref_rmap( > + struct xfs_scrub_context *sc, > + struct xfs_owner_info *oinfo, > + xfs_filblks_t cow_blocks) > +{ > + xfs_extlen_t refcbt_blocks = 0; > + xfs_filblks_t blocks; > + int error; > + > + /* Check that we saw as many refcbt blocks as the rmap knows about. */ > + error = xfs_btree_count_blocks(sc->sa.refc_cur, &refcbt_blocks); > + if (!xfs_scrub_btree_process_error(sc, sc->sa.refc_cur, 0, &error)) > + return; > + error = xfs_scrub_count_rmap_ownedby_ag(sc, sc->sa.rmap_cur, oinfo, > + &blocks); > + if (xfs_scrub_should_check_xref(sc, &error, &sc->sa.rmap_cur) && > + blocks != refcbt_blocks) > + xfs_scrub_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0); > + > + if (!sc->sa.rmap_cur) > + return; > + > + /* Check that we saw as many cow blocks as the rmap knows about. */ > + xfs_rmap_ag_owner(oinfo, XFS_RMAP_OWN_COW); > + error = xfs_scrub_count_rmap_ownedby_ag(sc, sc->sa.rmap_cur, oinfo, > + &blocks); > + if (xfs_scrub_should_check_xref(sc, &error, &sc->sa.rmap_cur) && > + blocks != cow_blocks) > + xfs_scrub_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0); Bit of a landmine that this code changes the owner info structure that was passed in.... > +} > + > /* Scrub the refcount btree for some AG. */ > int > xfs_scrub_refcountbt( > struct xfs_scrub_context *sc) > { > struct xfs_owner_info oinfo; > + xfs_agblock_t cow_blocks = 0; > + int error; > > xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC); > - return xfs_scrub_btree(sc, sc->sa.refc_cur, xfs_scrub_refcountbt_rec, > - &oinfo, NULL); > + error = xfs_scrub_btree(sc, sc->sa.refc_cur, xfs_scrub_refcountbt_rec, > + &oinfo, &cow_blocks); > + if (error) > + return error; > + > + if (sc->sa.rmap_cur) > + xfs_scrub_refcount_xref_rmap(sc, &oinfo, cow_blocks); .... because that's not obvious in this code here if we add anymore code after this call. > + > + return error; error is zero here, so "return 0" instead? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html