On Wed, Oct 04, 2017 at 04:59:56PM +1100, Dave Chinner wrote: > On Tue, Oct 03, 2017 at 08:58:53PM -0700, Darrick J. Wong wrote: > > On Wed, Oct 04, 2017 at 11:46:03AM +1100, Dave Chinner wrote: > > > On Tue, Oct 03, 2017 at 01:41:40PM -0700, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > > > > Add some helpers to enable us to lock an AG's headers, create btree > > > > cursors for all btrees in that allocation group, and clean up > > > > afterwards. > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > --- > > > > fs/xfs/scrub/common.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++++ > > > > fs/xfs/scrub/common.h | 10 +++ > > > > fs/xfs/scrub/scrub.c | 4 + > > > > fs/xfs/scrub/scrub.h | 21 ++++++ > > > > 4 files changed, 208 insertions(+) > > > > > > > > > > > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c > > > > index a84ba19..b056c9d 100644 > > > > --- a/fs/xfs/scrub/common.c > > > > +++ b/fs/xfs/scrub/common.c > > > > @@ -44,6 +44,7 @@ > > > > #include "scrub/scrub.h" > > > > #include "scrub/common.h" > > > > #include "scrub/trace.h" > > > > +#include "scrub/btree.h" > > > > > > > > /* Common code for the metadata scrubbers. */ > > > > > > > > @@ -298,6 +299,178 @@ xfs_scrub_set_incomplete( > > > > trace_xfs_scrub_incomplete(sc, __return_address); > > > > } > > > > > > > > +/* > > > > + * AG scrubbing > > > > + * > > > > + * These helpers facilitate locking an allocation group's header > > > > + * buffers, setting up cursors for all btrees that are present, and > > > > + * cleaning everything up once we're through. > > > > + */ > > > > + > > > > +/* Grab all the headers for an AG. */ > > > > +int > > > > +xfs_scrub_ag_read_headers( > > > > + struct xfs_scrub_context *sc, > > > > + xfs_agnumber_t agno, > > > > + struct xfs_buf **agi, > > > > + struct xfs_buf **agf, > > > > + struct xfs_buf **agfl) > > > > +{ > > > > + struct xfs_mount *mp = sc->mp; > > > > + int error; > > > > + > > > > + error = xfs_ialloc_read_agi(mp, sc->tp, agno, agi); > > > > + if (error) > > > > + goto out; > > > > + > > > > + error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, agf); > > > > + if (error) > > > > + goto out; > > > > + if (!*agf) { > > > > + error = -ENOMEM; > > > > + goto out; > > > > + } > > > > + > > > > + error = xfs_alloc_read_agfl(mp, sc->tp, agno, agfl); > > > > + if (error) > > > > + goto out; > > > > + > > > > +out: > > > > + return error; > > > > +} > > > > > > It's not immediately obvious what releases the buffers on error. > > > Maybe add a comment to say cleanup/release on error is unconditional > > > through xfs_scrub_ag_free()? > > > > > > Hmmm - now there's a question - is the reference we get here freed > > > through cancelling the fake transaction, or via the manual > > > xfs_trans_brelse() call in the free function? which one happens > > > first? add that to the comment? > > > > The AG headers /should/ always be released by the xfs_trans_brelse calls > > in the ag_free function, with a failsafe that the trans_cancel will dump > > anything else that we came across during our check, just in case all > > heck broke loose while we were checking. > > Ok, comments. :P Done. > > > And given this locks out the AG from allocation for an arbitrary > > > length of time, I'm wondering if we should add a flag into the pag > > > somewhere to say "being scrubbed" so the extent and inode allocation > > > code can skip over this AG and no block trying to lock it... > > > > That might be a good idea for a end-of-series enhancement. > > *nod* > > > > > Though it could use a little more engineering thought -- what about > > a more general ability to mark an AG offline? ISTR we discussed growing > > the ability to shut down an AG (rather than the whole FS) if scrub finds > > problems, and/or being able to control that from spaceman. The patch > > was "spaceman: AG state control". > > Well, only a small part of making an AG offline is preventing > allocation from blocking in it. What I suggested above is > completely internal functionality that users would never even know > about, so if we later want to add offline AG controls we can rework > the implementation scrub uses to fit into that model.... Yes, I think that sounds fairly straightforward. > > xfs_scrub has an -e option that allows the admin to specify what happens > > on an error. Right now it'll just shut down the filesystem, but > > presumably it could react to a per-ag metadata problem by shutting down > > the AG. > > Not that simple, I'm afraid. Think about modifying a directory that > has blocks that span multiple AGs. If we mark an AG as offline, then > what do we do with an attempt to modify that directory block? Even > if we can read it, do we allow the modification to proceed? How do > we even know ahead of time that a directory has blocks in an offline > AG? And if the AG is shut down, then the attempt to read the > directory block will get EIO, which will cause a dirty transaction > cancellation, which will cause a filesystem wide shutdown... > > Let's not complicate a simple optimisation specific to scrub by > trying to make it work wth blue-sky functionality that requires us > to solve a bunch of "OMFG HARD!" problems we haven't even thought > about yet, let alone have answers for.... Heh, ok, no more OMFG HARD than there already is. :) --D > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html