On Thu, Apr 15, 2021 at 10:00:39AM -0700, Darrick J. Wong wrote: > On Thu, Apr 15, 2021 at 06:33:12PM +1000, Dave Chinner wrote: > > + > > +void > > +xfs_perag_drop( > > Naming bikeshed: should this be xfs_perag_rele to match igrab/irele? Sure, easy enough to do. > > + struct xfs_perag *pag) > > +{ > > + if (atomic_dec_and_test(&pag->pag_active_ref)) > > + wake_up(&pag->pag_active_wq); > > +} > > + > > /* Check all the superblock fields we care about when reading one in. */ > > STATIC int > > xfs_validate_sb_read( > > diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h > > index f79f9dc632b6..bd3a0b910395 100644 > > --- a/fs/xfs/libxfs/xfs_sb.h > > +++ b/fs/xfs/libxfs/xfs_sb.h > > @@ -16,11 +16,16 @@ struct xfs_perag; > > /* > > * perag get/put wrappers for ref counting > > */ > > -extern struct xfs_perag *xfs_perag_get(struct xfs_mount *, xfs_agnumber_t); > > -extern struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *, xfs_agnumber_t, > > - int tag); > > -extern void xfs_perag_put(struct xfs_perag *pag); > > -extern int xfs_initialize_perag_data(struct xfs_mount *, xfs_agnumber_t); > > +int xfs_initialize_perag_data(struct xfs_mount *, xfs_agnumber_t); > > +struct xfs_perag *xfs_perag_get(struct xfs_mount *, xfs_agnumber_t); > > +struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *, xfs_agnumber_t, > > + int tag); > > +void xfs_perag_put(struct xfs_perag *pag); > > + > > +struct xfs_perag *xfs_perag_grab(struct xfs_mount *, xfs_agnumber_t); > > +struct xfs_perag *xfs_perag_grab_tag(struct xfs_mount *, xfs_agnumber_t, > > + int tag); > > +void xfs_perag_drop(struct xfs_perag *pag); > > > > extern void xfs_log_sb(struct xfs_trans *tp); > > extern int xfs_sync_sb(struct xfs_mount *mp, bool wait); > > diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c > > index 749faa17f8e2..a08f4253d5da 100644 > > --- a/fs/xfs/scrub/agheader.c > > +++ b/fs/xfs/scrub/agheader.c > > @@ -576,14 +576,18 @@ xchk_agf( > > xchk_block_set_corrupt(sc, sc->sa.agf_bp); > > > > /* Do the incore counters match? */ > > - pag = xfs_perag_get(mp, agno); > > + pag = xfs_perag_grab(mp, agno); > > + if (!pag) { > > + error = -ENOSPC; > > + goto out; > > This should be ENOENT, since that's the error code that scrub uses to > indicate that the resource doesn't exist and can't be checked. OK. I just used ENOSPC everywhere for this error as a starting point. I didn't think too hard about what the exact error should be for all situations, I just needed something that would stand out from the crowd... > > out: > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c > > index aa874607618a..d2e3cf63d237 100644 > > --- a/fs/xfs/scrub/common.c > > +++ b/fs/xfs/scrub/common.c > > @@ -554,8 +554,10 @@ xchk_ag_init( > > } > > > > /* > > - * Grab the per-ag structure if we haven't already gotten it. Teardown of the > > + * Get the per-ag structure if we haven't already gotten it. Teardown of the > > * xchk_ag will release it for us. > > + * > > + * XXX: does this need to be a grab? > > If owning the buffer lock on the AGI/AGF isn't sufficient to guarantee > that we can get an active reference to the perag structure, then yes. No, the buffer is just holding a passive reference to the perag, and that alone is not enough to protect/serialise against a shrink of offline AG operation. That's what the active references do. I was looking at this last night - there's a bunch of cases were we can use passive references from the buffer, but they all require a context where there's a higher level object that pins the AG against shrink. e.g. an inode will pin against shrink, because the existence of an inode in cache in that AG means the AG is not empty. Shrink cannot proceed until the inode cache for that AG has been fully reclaimed and new lookups cannot occur. The example I used was NFS filehandle lookups need to ESTALE without actually being able to access the perag, so we need an active reference for xfs_iget() to be able to populate the cache... That said, I think as many bp->b_pag use cases possible should be converted to active references before we take AG header buffer locks so that the mechanism can (eventually) be used as a general "make no modifications to the AG" mechanism, rather than just being specific to the needs of shrink. > > @@ -167,11 +168,13 @@ xchk_fscount_aggregate_agcounts( > > fsc->fdblocks = 0; > > > > for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) { > > - pag = xfs_perag_get(mp, agno); > > + pag = xfs_perag_grab(mp, agno); > > + if (!pag) > > + return -ENOSPC; > > Hmmm, not sure what the proper errno is for this case. If !pag, we know > the AG is being torn down, but do we know if the free space counters > have been updated? Right, that's the question that lead me to just using ENOSPC everywhere. > IOWS, should we be serializing this scrubber against shrink operations? > The usual idiom in scrub is that you'd return EDEADLOCK here, and then > xfs_scrub_metadata will re-setup and re-run with the TRY_HARDER flag set > so that xchk_setup_fscounters can lock the world before trying again. We have to serialise it against shrink in some way, because the AG could be in the process of being torn down and the agcounts may be invalid at this point. I'd like to see these 0..sb_agcount loops go away and get replaced with perag iterators similar to the one you added to xfs_icache.c: #define for_each_perag_tag(mp, next_agno, pag, tag) \ for ((next_agno) = 0, (pag) = xfs_perag_get_tag((mp), 0, (tag)); \ (pag) != NULL; \ (next_agno) = (pag)->pag_agno + 1, \ xfs_perag_put(pag), \ (pag) = xfs_perag_get_tag((mp), (next_agno), (tag))) Then all of these loops that immediately just get/grab a perag can hide all of this mess and operate solely on perag structures. Essentially, everywhere we pass an agno to indicate "operate on this AG" should pass an actively referenced perag, not a raw agno. We already do this in some places, but there's still a bunch of cleanup needed to make this work 100% reliably. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx