Re: [PATCH 08/20] xfs: implement the metadata repair ioctl flag

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Tue, 27 Mar 2018 21:58:12 -0700

On Wed, Mar 28, 2018 at 10:55:14AM +1100, Dave Chinner wrote:
> On Mon, Mar 26, 2018 at 04:56:46PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > Plumb in the pieces necessary to make the "scrub" subfunction of
> > the scrub ioctl actually work.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> Can you list the pieces here - the ioctl, the errtag for debug, etc?

"This means that we make XFS_SCRUB_IFLAG_REPAIR actually do something
when userspace calls the scrub ioctl and add a debugging error tag so
that xfstests can force the kernel to repair things even when it's not
necessary."

> 
> ....
> 
> > +config XFS_ONLINE_REPAIR
> > +	bool "XFS online metadata repair support"
> > +	default n
> > +	depends on XFS_FS && XFS_ONLINE_SCRUB
> > +	help
> > +	  If you say Y here you will be able to repair metadata on a
> > +	  mounted XFS filesystem.  This feature is intended to reduce
> > +	  filesystem downtime even further by fixing minor problems
> 
> s/even further//

Ok.

> > +	  before they cause the filesystem to go down.  However, it
> > +	  requires that the filesystem be formatted with secondary
> > +	  metadata, such as reverse mappings and inode parent pointers.
> > +
> > +	  This feature is considered EXPERIMENTAL.  Use with caution!
> > +
> > +	  See the xfs_scrub man page in section 8 for additional information.
> > +
> > +	  If unsure, say N.
> 
> .....
> 
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index faf1a4e..8bf3ded 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -542,13 +542,20 @@ struct xfs_scrub_metadata {
> >  /* o: Metadata object looked funny but isn't corrupt. */
> >  #define XFS_SCRUB_OFLAG_WARNING		(1 << 6)
> >  
> > +/*
> > + * o: IFLAG_REPAIR was set but metadata object did not need fixing or
> > + *    optimization and has therefore not been altered.
> > + */
> > +#define XFS_SCRUB_OFLAG_UNTOUCHED	(1 << 7)
> 
> bikeshed: CLEAN rather than UNTOUCHED?

The thing I don't like about using 'CLEAN' is that we only set this flag
if userspace told us to touch (i.e. repair) the structure and the kernel
decides to leave it alone.

> > +#ifndef __XFS_SCRUB_REPAIR_H__
> > +#define __XFS_SCRUB_REPAIR_H__
> > +
> > +#if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
> 
> Don't we use the form 
> 
> #ifdef XFS_ONLINE_REPAIR
> 
> elsewhere? We should be consistent on how we detect config options,
> if IS_ENABLED() is the prefered way of doing it now, should we
> change everything to do this?

Yeah, I could do that.  Originally I intended someday to rip out the
Kconfig option stuff entirely, but now I'm starting to wonder if we
should leave it forever for people who consider it too hot to handle (or
kernel tinyfication types)...

...anyway, I'll migrate it to the conventional

#ifdef CONFIG_XFS_ONLINE_REPAIR

phraseology.

> > +/* Online repair only works for v5 filesystems. */
> > +static inline bool xfs_repair_can_fix(struct xfs_mount *mp)
> > +{
> > +	return xfs_sb_version_hascrc(&mp->m_sb);
> > +}
> > +
> > +/* Did userspace want us to repair /and/ we found something to fix? */
> > +static inline bool xfs_repair_should_fix(struct xfs_scrub_metadata *sm)
> > +{
> > +	return (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) &&
> > +	       (sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
> > +				XFS_SCRUB_OFLAG_XCORRUPT |
> > +				XFS_SCRUB_OFLAG_PREEN));
> > +}
> > +
> > +/* Did userspace tell us to fix something and it didn't get fixed? */
> > +static inline bool xfs_repair_unfixed(struct xfs_scrub_metadata *sm)
> > +{
> > +	return (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) &&
> > +	       (sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
> > +				XFS_SCRUB_OFLAG_XCORRUPT));
> > +}
> 
> Ok, so I don't understand how these determine whether something
> wants repair and then didn't get repaired? Clearly there's some
> context related thing I'm not seeing in the way these are called,
> but looking at the code I don't understand the difference between
> "want to do" and "did not do".

The distinction is almost entirely in /when/ they're called.  Maybe it's
easier (as Christoph occasionally nags me) to avoid hiding this all in
single-use helpers and put them directly in xfs_scrub_metadata() along
with more comments.

> > +int xfs_repair_probe(struct xfs_scrub_context *sc, uint32_t scrub_oflags);
> > +
> > +#else
> > +
> > +# define xfs_repair_can_fix(mp)		(false)
> > +# define xfs_repair_should_fix(sm)	(false)
> > +# define xfs_repair_probe		(NULL)
> > +# define xfs_repair_unfixed(sm)		(false)
> 
> static inline is preffered for these, right? So we still get type
> checking even when they aren't enabled?

Ok.

> > @@ -120,6 +125,24 @@
> >   * XCORRUPT flag; btree query function errors are noted by setting the
> >   * XFAIL flag and deleting the cursor to prevent further attempts to
> >   * cross-reference with a defective btree.
> > + *
> > + * If a piece of metadata proves corrupt or suboptimal, the userspace
> > + * program can ask the kernel to apply some tender loving care (TLC) to
> > + * the metadata object by setting the REPAIR flag and re-calling the
> > + * scrub ioctl.  "Corruption" is defined by metadata violating the
> > + * on-disk specification; operations cannot continue if the violation is
> > + * left untreated.  It is possible for XFS to continue if an object is
> > + * "suboptimal", however performance may be degraded.  Repairs are
> > + * usually performed by rebuilding the metadata entirely out of
> > + * redundant metadata.  Optimizing, on the other hand, can sometimes be
> > + * done without rebuilding entire structures.
> > + *
> > + * Generally speaking, the repair code has the following code structure:
> > + * Lock -> scrub -> repair -> commit -> re-lock -> re-scrub -> unlock.
> > + * The first check helps us figure out if we need to rebuild or simply
> > + * optimize the structure so that the rebuild knows what to do.  The
> > + * second check evaluates the completeness of the repair; that is what
> > + * is reported to userspace.
> >   */
> >  
> >  /*
> > @@ -155,7 +178,10 @@ xfs_scrub_teardown(
> >  {
> >  	xfs_scrub_ag_free(sc, &sc->sa);
> >  	if (sc->tp) {
> > -		xfs_trans_cancel(sc->tp);
> > +		if (error == 0 && (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR))
> > +			error = xfs_trans_commit(sc->tp);
> > +		else
> > +			xfs_trans_cancel(sc->tp);
> >  		sc->tp = NULL;
> >  	}
> >  	if (sc->ip) {
> > @@ -180,6 +206,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
> >  		.type	= ST_NONE,
> >  		.setup	= xfs_scrub_setup_fs,
> >  		.scrub	= xfs_scrub_probe,
> > +		.repair = xfs_repair_probe,
> >  	},
> >  	[XFS_SCRUB_TYPE_SB] = {		/* superblock */
> >  		.type	= ST_PERAG,
> > @@ -379,15 +406,107 @@ xfs_scrub_validate_inputs(
> >  	if (!xfs_sb_version_hasextflgbit(&mp->m_sb))
> >  		goto out;
> >  
> > -	/* We don't know how to repair anything yet. */
> > -	if (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
> > -		goto out;
> > +	/*
> > +	 * We only want to repair read-write v5+ filesystems.  Defer the check
> > +	 * for ops->repair until after our scrub confirms that we need to
> > +	 * perform repairs so that we avoid failing due to not supporting
> > +	 * repairing an object that doesn't need repairs.
> > +	 */
> > +	if (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) {
> > +		error = -EOPNOTSUPP;
> > +		if (!xfs_repair_can_fix(mp))
> > +			goto out;
> > +
> > +		error = -EROFS;
> > +		if (mp->m_flags & XFS_MOUNT_RDONLY)
> > +			goto out;
> > +	}
> >  
> >  	error = 0;
> >  out:
> >  	return error;
> >  }
> >  
> > +/*
> > + * Attempt to repair some metadata, if the metadata is corrupt and userspace
> > + * told us to fix it.  This function returns -EAGAIN to mean "re-run scrub",
> > + * and will set *fixed if it thinks it repaired anything.
> > + */
> > +STATIC int
> > +xfs_repair_attempt(
> > +	struct xfs_inode		*ip,
> > +	struct xfs_scrub_context	*sc,
> > +	bool				*fixed)
> > +{
> > +	uint32_t			scrub_oflags;
> > +	int				error = 0;
> > +
> > +	trace_xfs_repair_attempt(ip, sc->sm, error);
> > +
> > +	/* Repair needed but not supported, just exit. */
> > +	if (!sc->ops->repair) {
> > +		error = -EOPNOTSUPP;
> > +		trace_xfs_repair_done(ip, sc->sm, error);
> > +		return error;
> > +	}
> > +
> > +	xfs_scrub_ag_btcur_free(&sc->sa);
> > +
> > +	/*
> > +	 * Repair whatever's broken.  We have to clear the out flags during the
> > +	 * repair call because some of our iterator functions abort if any of
> > +	 * the corruption flags are set.
> > +	 */
> > +	scrub_oflags = sc->sm->sm_flags & XFS_SCRUB_FLAGS_OUT;
> > +	sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
> > +	error = sc->ops->repair(sc, scrub_oflags);
> 
> Urk, that's a bit messy. Shouldn't we drive that inwards to just the
> iterator methods that have this problem? Then we can slowly work
> over the iterators that are problematic and fix them?

Hmm, I'll have a closer look tomorrow at which ones actually trigger
this...

> > +	trace_xfs_repair_done(ip, sc->sm, error);
> > +	sc->sm->sm_flags |= scrub_oflags;
> > +
> > +	switch (error) {
> > +	case 0:
> > +		/*
> > +		 * Repair succeeded.  Commit the fixes and perform a second
> > +		 * scrub so that we can tell userspace if we fixed the problem.
> > +		 */
> > +		sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
> > +		*fixed = true;
> > +		return -EAGAIN;
> > +	case -EDEADLOCK:
> > +	case -EAGAIN:
> > +		/* Tell the caller to try again having grabbed all the locks. */
> > +		if (!sc->try_harder) {
> > +			sc->try_harder = true;
> > +			return -EAGAIN;
> > +		}
> > +		/*
> > +		 * We tried harder but still couldn't grab all the resources
> > +		 * we needed to fix it.  The corruption has not been fixed,
> > +		 * so report back to userspace.
> > +		 */
> > +		return -EFSCORRUPTED;
> > +	default:
> > +		return error;
> > +	}
> > +}
> > +
> > +/*
> > + * Complain about unfixable problems in the filesystem.  We don't log
> > + * corruptions when IFLAG_REPAIR wasn't set on the assumption that the driver
> > + * program is xfs_scrub, which will call back with IFLAG_REPAIR set if the
> > + * administrator isn't running xfs_scrub in no-repairs mode.
> > + *
> > + * Use this helper function because _ratelimited silently declares a static
> > + * structure to track rate limiting information.
> > + */
> > +static void
> > +xfs_repair_failure(
> > +	struct xfs_mount		*mp)
> > +{
> > +	xfs_alert_ratelimited(mp,
> > +"Corruption not fixed during online repair.  Unmount and run xfs_repair.");
> > +}
> 
> Using the general rate limiting message functions means this message
> can be dropped when there is lots of other ratelimited alert level
> messages being emitted. IMO, this is not a message we want dropped,
> so if it needs rate limiting, it should use it's own private
> ratelimit structure.

Uh... isn't that what xfs_alert_ratelimited already does?

#define xfs_alert_ratelimited(dev, fmt, ...) \
	xfs_printk_ratelimited(xfs_alert, dev, fmt, ##__VA_ARGS__)

#define xfs_printk_ratelimited(func, dev, fmt, ...)		\
do { \
	static DEFINE_RATELIMIT_STATE(_rs, \
				      DEFAULT_RATELIMIT_INTERVAL, \
				      DEFAULT_RATELIMIT_BURST); \
	if (__ratelimit(&_rs)) \
		func(dev, fmt, ##__VA_ARGS__);			\
} while (0)

That declares a static ratelimit structure inside xfs_repair_failure,
right?

> >  /* Dispatch metadata scrubbing. */
> >  int
> >  xfs_scrub_metadata(
> > @@ -397,6 +516,7 @@ xfs_scrub_metadata(
> >  	struct xfs_scrub_context	sc;
> >  	struct xfs_mount		*mp = ip->i_mount;
> >  	bool				try_harder = false;
> > +	bool				already_fixed = false;
> >  	int				error = 0;
> >  
> >  	BUILD_BUG_ON(sizeof(meta_scrub_ops) !=
> > @@ -446,9 +566,44 @@ xfs_scrub_metadata(
> >  	} else if (error)
> >  		goto out_teardown;
> >  
> > -	if (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
> > -			       XFS_SCRUB_OFLAG_XCORRUPT))
> > -		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
> > +	/* Let debug users force us into the repair routines. */
> > +	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed &&
> > +			XFS_TEST_ERROR(false, mp,
> > +					XFS_ERRTAG_FORCE_SCRUB_REPAIR)) {
> > +		sc.sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
> > +	}
> > +
> > +	/*
> > +	 * If userspace asked for a repair but it wasn't necessary, report
> > +	 * that back to userspace.
> > +	 */
> > +	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed &&
> > +	    !(sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
> > +				 XFS_SCRUB_OFLAG_XCORRUPT |
> > +				 XFS_SCRUB_OFLAG_PREEN)))
> > +		sc.sm->sm_flags |= XFS_SCRUB_OFLAG_UNTOUCHED;
> 
> Duplicate checks, could be done like this:
> 
> 	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed) {
> 		/* Let debug users force us into the repair routines. */
> 		if (XFS_TEST_ERROR(false, mp,
> 			.....
> 
> 		/*
> 		 * If userspace asked for a repair but it wasn't
> 		 * necessary,
> 		.....
> 	}

Yeah.  I think I'll rip out the helpers so it'll be easier to tell from
the position of the OFLAG check relative to the ops->repair call what
the difference is between "want to fix" and "tried to fix and couldn't"

> > +	/*
> > +	 * If it's broken, userspace wants us to fix it, and we haven't already
> > +	 * tried to fix it, then attempt a repair.
> > +	 */
> > +	if (xfs_repair_should_fix(sc.sm) && !already_fixed) {
> 
> You could reduce indent by doing:
> 
> 	if (!xfs_repair_should_fix(sc.sm) || already_fixed)
> 		goto out_check_unfixed;

<nod>  Thanks for taking on the review of this. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html