Re: [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 22, 2015 at 06:07:25PM +1000, Dave Chinner wrote:
> On Fri, Sep 11, 2015 at 02:53:55PM -0400, Brian Foster wrote:
> > Since the onset of v5 superblocks, the LSN of the last modification has
> > been included in a variety of on-disk data structures. This LSN is used
> > to provide log recovery ordering guarantees (e.g., to ensure an older
> > log recovery item is not replayed over a newer target data structure).
> > 
> > While this works correctly from the point a filesystem is formatted and
> > mounted, userspace tools have some problematic behaviors that defeat
> > this mechanism. For example, xfs_repair historically zeroes out the log
> > unconditionally (regardless of whether corruption is detected). If this
> > occurs, the LSN of the filesystem is reset and the log is now in a
> > problematic state with respect to on-disk metadata structures that might
> > have a larger LSN. Until either the log catches up to the highest
> > previously used metadata LSN or each affected data structure is modified
> > and written out without incident (which resets the metadata LSN), log
> > recovery is susceptible to filesystem corruption.
> > 
> > This problem is ultimately addressed and repaired in the associated
> > userspace tools. The kernel is still responsible to detect the problem
> > and notify the user that something is wrong. Check the superblock LSN at
> > mount time and notify the user and fail the mount if it is invalid.
> > From that point on, simply warn the user if an invalid metadata LSN is
> > detected from the various metadata I/O verifiers. A new xfs_mount flag
> > is added to guarantee that a warning fires at least once for each
> > affected filesystem mount. From that point forward, further instances of
> > the problem are rate limited to a once per 24 hour period.
> > 
> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c          |  9 +++++--
> >  fs/xfs/libxfs/xfs_attr_leaf.c      |  2 ++
> >  fs/xfs/libxfs/xfs_btree.c          | 14 +++++++++--
> >  fs/xfs/libxfs/xfs_da_btree.c       |  3 +++
> >  fs/xfs/libxfs/xfs_dir2_block.c     |  1 +
> >  fs/xfs/libxfs/xfs_dir2_data.c      |  1 +
> >  fs/xfs/libxfs/xfs_dir2_leaf.c      |  1 +
> >  fs/xfs/libxfs/xfs_dir2_node.c      |  1 +
> >  fs/xfs/libxfs/xfs_ialloc.c         |  8 +++++--
> >  fs/xfs/libxfs/xfs_sb.c             |  8 +++++++
> >  fs/xfs/libxfs/xfs_symlink_remote.c |  3 +++
> >  fs/xfs/xfs_log.c                   |  1 -
> >  fs/xfs/xfs_log_priv.h              | 24 +++++++++++++++++++
> >  fs/xfs/xfs_log_recover.c           | 15 +++++++++++-
> >  fs/xfs/xfs_mount.c                 | 49 ++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_mount.h                 |  3 +++
> >  16 files changed, 135 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index ffad7f2..cb26016 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -482,6 +482,8 @@ xfs_agfl_verify(
> >  		    be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
> >  			return false;
> >  	}
> > +
> > +	xfs_detect_invalid_lsn(mp, be64_to_cpu(XFS_BUF_TO_AGFL(bp)->agfl_lsn));
> >  	return true;
> 
> 	xfs_log_check_lsn(mp, <lsn>);
> 

Ok.

> > +/*
> > + * The LSN is valid so long as it is behind the current LSN. If it isn't, this
> > + * means that the next log record that includes this metadata could have a
> > + * smaller LSN. In turn, this means that the modification in the log would not
> > + * replay.
> > + */
> > +static inline bool
> > +xlog_valid_lsn(
> > +	struct xlog	*log,
> > +	xfs_lsn_t	lsn)
> > +{
> > +	int		cycle = CYCLE_LSN(lsn);
> > +	int		block = BLOCK_LSN(lsn);
> > +	bool		valid = true;
> > +
> > +	spin_lock(&log->l_icloglock);
> > +	if ((cycle > log->l_curr_cycle) ||
> > +	    (cycle == log->l_curr_cycle && block > log->l_curr_block))
> > +		valid = false;
> > +	spin_unlock(&log->l_icloglock);
> > +
> > +	return valid;
> > +}
> 
> We do not want to take the l_icloglock in metadata IO
> context. It's a global fs lock, and it's a hot lock, too, so we
> do not want to be taking this during every metadata IO completion.
> 
> I think, at minimum, we need to sample the current values and do an
> unlocked check, then if the result is suspect we need to do an
> accurate locked check. This means that we will almost never take the
> l_icloglock here. e.g:
> 
> {
> 	int cycle = ACCESS_ONCE(log->l_curr_cycle);
> 	int block = ACCESS_ONCE(log->l_curr_block);
> 
> 	if (CYCLE_LSN(lsn) > cycle ||
> 	    (CYCLE_LSN(lsn) == cycle && BLOCK_LSN(lsn) > block)) {
> 		/* suspect, take icloglock to check again */
> 		bool valid = true;
> 
> 		....
> 		return valid;
> 	}
> 	return true;
> }
> 

Good point. In this case, I may just lift the lock out of the helper and
allow this to be the responsibility of the caller, but I'll see how it
looks.

> 
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -41,6 +41,7 @@
> >  #include "xfs_trace.h"
> >  #include "xfs_icache.h"
> >  #include "xfs_sysfs.h"
> > +#include "xfs_log_priv.h"
> >  
> >  
> >  static DEFINE_MUTEX(xfs_uuid_table_mutex);
> > @@ -1301,3 +1302,51 @@ xfs_dev_is_read_only(
> >  	}
> >  	return 0;
> >  }
> > +
> > +/*
> > + * Verify that an LSN stamped into a piece of metadata is valid. This is
> > + * intended for use in read verifiers on v5 superblocks.
> > + */
> > +void
> > +xfs_detect_invalid_lsn(
> > +	struct xfs_mount	*mp,
> > +	xfs_lsn_t		lsn)
> > +{
> 
> This shoul dbe called xfs_log_check_lsn() and be located in
> xfs_log.c.
> 

Ok.

> > +	struct xlog		*log = mp->m_log;
> > +	int			cycle = CYCLE_LSN(lsn);
> > +	int			block = BLOCK_LSN(lsn);
> > +	const char 		*fmt;
> > +
> > +	/*
> > +	 * 'norecovery' mode skips mount-time log processing and unconditionally
> > +	 * resets the LSN.
> > +	 */
> > +	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
> > +		return;
> > +
> > +	if (xlog_valid_lsn(mp->m_log, lsn))
> > +		return;
> > +
> > +	/*
> > +	 * Warn at least once for each filesystem susceptible to this problem
> > +	 * and once per day thereafter.
> > +	 *
> > +	 * XXX: Specify the minimum xfs_repair version required to resolve?
> > +	 * Older versions will silently reintroduce the problem.
> > +	 *
> > +	 * We should eventually convert this condition into a hard error (via
> > +	 * verifier failure). Defer that behavior for a few release cycles or so
> > +	 * until the userspace fixes have had the opportunity to propogate
> > +	 * throughout the various distributions.
> > +	 */
> > +	fmt = "WARNING: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). "
> > +"This may result in filesystem failure. Please unmount and run xfs_repair to resolve.";
> > +	if (!mp->m_warned_bad_lsn) {
> > +		mp->m_warned_bad_lsn = true;
> > +		xfs_warn(mp, fmt, cycle, block, log->l_curr_cycle,
> > +			 log->l_curr_block);
> > +	} else {
> > +		xfs_warn_daily(mp, fmt, cycle, block, log->l_curr_cycle,
> > +			       log->l_curr_block);
> > +	}
> 
> That's really ugly. Rate limited prints should always occur at least
> once for each ratelimit key, and now i've seen this I can say the
> first patch needs rework.
> 

This is definitely more ugly than the shutdown approach. As mentioned,
I've been waffling back and forth on how to handle this condition. I had
already sent the shutdown version, so I figured I would put together
what I thought would be necessary to handle this with a warning for
comparison.

> that is, each xfs_warn_daily() caller needs it's own ratelimit key,
> rather than sharing the global ratelimit key that all other XFS
> messages share.
> 

What do you mean by a key? What is the purpose?

On further thought... I think I see what you mean. Each mount should
report in the ratelimited timeframe independently. Or in other words,
the rate limit of one mount should not affect message delivery on
another.

> e.g.:
> 	xfs_warn_daily(mp, mp->m_log->l_lsn_rlstate,
> "Corruption warning: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). "
> "Please unmount and run xfs_repair (>= v4.3) to resolve.".
> 			cycle, block, log->l_curr_cycle, log->l_curr_block);
> 
> But really, thinking on this further (the "corruption warning" bit)
> I'm starting to think we should just shut the filesystem down when
> we hit this and force the user to fix it immediately. if we don't,
> then who knows whether we are making things worse or not...
> 

I was originally thinking the shutdown might not be ideal because we
induce the very situation we're warning about (forcing a log recovery).
Since this verifies on metadata reads, however, we should in theory
avoid the problem before it could ever be introduced. My follow on
question to that was whether a shutdown is appropriate for something
that is somewhat minor and self-corrective..?

Given the ugliness of the multiple warnings, new mount flag, additional
complication of the ratelimit mechanism, etc., I might just go back to
the shutdown and call it a day. ;) I think the subtle point raised above
about calling this a corruption is a sufficient argument. While this
might be minor and self-correcting, it is still technically a corruption
of sorts and should be identified and repaired appropriately. Thanks for
the feedback.

Brian

> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -127,6 +127,7 @@ typedef struct xfs_mount {
> >  	int64_t			m_low_space[XFS_LOWSP_MAX];
> >  						/* low free space thresholds */
> >  	struct xfs_kobj		m_kobj;
> > +	bool			m_warned_bad_lsn;/* bad metadata lsn detected */
> 
> That's where I'd put the ratelimit key (or in the struct xlog).
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux