Re: [PATCH 08/25] xfs: create helpers to scan an allocation group

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 4 Oct 2017 16:59:56 +1100

On Tue, Oct 03, 2017 at 08:58:53PM -0700, Darrick J. Wong wrote:
> On Wed, Oct 04, 2017 at 11:46:03AM +1100, Dave Chinner wrote:
> > On Tue, Oct 03, 2017 at 01:41:40PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > 
> > > Add some helpers to enable us to lock an AG's headers, create btree
> > > cursors for all btrees in that allocation group, and clean up
> > > afterwards.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/scrub/common.c |  173 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/scrub/common.h |   10 +++
> > >  fs/xfs/scrub/scrub.c  |    4 +
> > >  fs/xfs/scrub/scrub.h  |   21 ++++++
> > >  4 files changed, 208 insertions(+)
> > > 
> > > 
> > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > > index a84ba19..b056c9d 100644
> > > --- a/fs/xfs/scrub/common.c
> > > +++ b/fs/xfs/scrub/common.c
> > > @@ -44,6 +44,7 @@
> > >  #include "scrub/scrub.h"
> > >  #include "scrub/common.h"
> > >  #include "scrub/trace.h"
> > > +#include "scrub/btree.h"
> > >  
> > >  /* Common code for the metadata scrubbers. */
> > >  
> > > @@ -298,6 +299,178 @@ xfs_scrub_set_incomplete(
> > >  	trace_xfs_scrub_incomplete(sc, __return_address);
> > >  }
> > >  
> > > +/*
> > > + * AG scrubbing
> > > + *
> > > + * These helpers facilitate locking an allocation group's header
> > > + * buffers, setting up cursors for all btrees that are present, and
> > > + * cleaning everything up once we're through.
> > > + */
> > > +
> > > +/* Grab all the headers for an AG. */
> > > +int
> > > +xfs_scrub_ag_read_headers(
> > > +	struct xfs_scrub_context	*sc,
> > > +	xfs_agnumber_t			agno,
> > > +	struct xfs_buf			**agi,
> > > +	struct xfs_buf			**agf,
> > > +	struct xfs_buf			**agfl)
> > > +{
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	int				error;
> > > +
> > > +	error = xfs_ialloc_read_agi(mp, sc->tp, agno, agi);
> > > +	if (error)
> > > +		goto out;
> > > +
> > > +	error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, agf);
> > > +	if (error)
> > > +		goto out;
> > > +	if (!*agf) {
> > > +		error = -ENOMEM;
> > > +		goto out;
> > > +	}
> > > +
> > > +	error = xfs_alloc_read_agfl(mp, sc->tp, agno, agfl);
> > > +	if (error)
> > > +		goto out;
> > > +
> > > +out:
> > > +	return error;
> > > +}
> > 
> > It's not immediately obvious what releases the buffers on error.
> > Maybe add a comment to say cleanup/release on error is unconditional
> > through xfs_scrub_ag_free()?
> > 
> > Hmmm - now there's a question - is the reference we get here freed
> > through cancelling the fake transaction, or via the manual
> > xfs_trans_brelse() call in the free function? which one happens
> > first? add that to the comment?
> 
> The AG headers /should/ always be released by the xfs_trans_brelse calls
> in the ag_free function, with a failsafe that the trans_cancel will dump
> anything else that we came across during our check, just in case all
> heck broke loose while we were checking.

Ok, comments. :P

> 
> > And given this locks out the AG from allocation for an arbitrary
> > length of time, I'm wondering if we should add a flag into the pag
> > somewhere to say "being scrubbed" so the extent and inode allocation
> > code can skip over this AG and no block trying to lock it...
> 
> That might be a good idea for a end-of-series enhancement.

*nod*

> 
> Though it could use a little more engineering thought -- what about
> a more general ability to mark an AG offline?  ISTR we discussed growing
> the ability to shut down an AG (rather than the whole FS) if scrub finds
> problems, and/or being able to control that from spaceman.  The patch
> was "spaceman: AG state control".

Well, only a small part of making an AG offline is preventing
allocation from blocking in it.  What I suggested above is
completely internal functionality that users would never even know
about, so if we later want to add offline AG controls we can rework
the implementation scrub uses to fit into that model....

> xfs_scrub has an -e option that allows the admin to specify what happens
> on an error.  Right now it'll just shut down the filesystem, but
> presumably it could react to a per-ag metadata problem by shutting down
> the AG.

Not that simple, I'm afraid. Think about modifying a directory that
has blocks that span multiple AGs. If we mark an AG as offline, then
what do we do with an attempt to modify that directory block? Even
if we can read it, do we allow the modification to proceed? How do
we even know ahead of time that a directory has blocks in an offline
AG?  And if the AG is shut down, then the attempt to read the
directory block will get EIO, which will cause a dirty transaction
cancellation, which will cause a filesystem wide shutdown...

Let's not complicate a simple optimisation specific to scrub by
trying to make it work wth blue-sky functionality that requires us
to solve a bunch of "OMFG HARD!" problems we haven't even thought
about yet, let alone have answers for....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html