On Tue, Sep 22, 2015 at 06:07:25PM +1000, Dave Chinner wrote: > On Fri, Sep 11, 2015 at 02:53:55PM -0400, Brian Foster wrote: > > Since the onset of v5 superblocks, the LSN of the last modification has > > been included in a variety of on-disk data structures. This LSN is used > > to provide log recovery ordering guarantees (e.g., to ensure an older > > log recovery item is not replayed over a newer target data structure). > > > > While this works correctly from the point a filesystem is formatted and > > mounted, userspace tools have some problematic behaviors that defeat > > this mechanism. For example, xfs_repair historically zeroes out the log > > unconditionally (regardless of whether corruption is detected). If this > > occurs, the LSN of the filesystem is reset and the log is now in a > > problematic state with respect to on-disk metadata structures that might > > have a larger LSN. Until either the log catches up to the highest > > previously used metadata LSN or each affected data structure is modified > > and written out without incident (which resets the metadata LSN), log > > recovery is susceptible to filesystem corruption. > > > > This problem is ultimately addressed and repaired in the associated > > userspace tools. The kernel is still responsible to detect the problem > > and notify the user that something is wrong. Check the superblock LSN at > > mount time and notify the user and fail the mount if it is invalid. > > From that point on, simply warn the user if an invalid metadata LSN is > > detected from the various metadata I/O verifiers. A new xfs_mount flag > > is added to guarantee that a warning fires at least once for each > > affected filesystem mount. From that point forward, further instances of > > the problem are rate limited to a once per 24 hour period. > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > --- > > fs/xfs/libxfs/xfs_alloc.c | 9 +++++-- > > fs/xfs/libxfs/xfs_attr_leaf.c | 2 ++ > > fs/xfs/libxfs/xfs_btree.c | 14 +++++++++-- > > fs/xfs/libxfs/xfs_da_btree.c | 3 +++ > > fs/xfs/libxfs/xfs_dir2_block.c | 1 + > > fs/xfs/libxfs/xfs_dir2_data.c | 1 + > > fs/xfs/libxfs/xfs_dir2_leaf.c | 1 + > > fs/xfs/libxfs/xfs_dir2_node.c | 1 + > > fs/xfs/libxfs/xfs_ialloc.c | 8 +++++-- > > fs/xfs/libxfs/xfs_sb.c | 8 +++++++ > > fs/xfs/libxfs/xfs_symlink_remote.c | 3 +++ > > fs/xfs/xfs_log.c | 1 - > > fs/xfs/xfs_log_priv.h | 24 +++++++++++++++++++ > > fs/xfs/xfs_log_recover.c | 15 +++++++++++- > > fs/xfs/xfs_mount.c | 49 ++++++++++++++++++++++++++++++++++++++ > > fs/xfs/xfs_mount.h | 3 +++ > > 16 files changed, 135 insertions(+), 8 deletions(-) > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > index ffad7f2..cb26016 100644 > > --- a/fs/xfs/libxfs/xfs_alloc.c > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > @@ -482,6 +482,8 @@ xfs_agfl_verify( > > be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks) > > return false; > > } > > + > > + xfs_detect_invalid_lsn(mp, be64_to_cpu(XFS_BUF_TO_AGFL(bp)->agfl_lsn)); > > return true; > > xfs_log_check_lsn(mp, <lsn>); > Ok. > > +/* > > + * The LSN is valid so long as it is behind the current LSN. If it isn't, this > > + * means that the next log record that includes this metadata could have a > > + * smaller LSN. In turn, this means that the modification in the log would not > > + * replay. > > + */ > > +static inline bool > > +xlog_valid_lsn( > > + struct xlog *log, > > + xfs_lsn_t lsn) > > +{ > > + int cycle = CYCLE_LSN(lsn); > > + int block = BLOCK_LSN(lsn); > > + bool valid = true; > > + > > + spin_lock(&log->l_icloglock); > > + if ((cycle > log->l_curr_cycle) || > > + (cycle == log->l_curr_cycle && block > log->l_curr_block)) > > + valid = false; > > + spin_unlock(&log->l_icloglock); > > + > > + return valid; > > +} > > We do not want to take the l_icloglock in metadata IO > context. It's a global fs lock, and it's a hot lock, too, so we > do not want to be taking this during every metadata IO completion. > > I think, at minimum, we need to sample the current values and do an > unlocked check, then if the result is suspect we need to do an > accurate locked check. This means that we will almost never take the > l_icloglock here. e.g: > > { > int cycle = ACCESS_ONCE(log->l_curr_cycle); > int block = ACCESS_ONCE(log->l_curr_block); > > if (CYCLE_LSN(lsn) > cycle || > (CYCLE_LSN(lsn) == cycle && BLOCK_LSN(lsn) > block)) { > /* suspect, take icloglock to check again */ > bool valid = true; > > .... > return valid; > } > return true; > } > Good point. In this case, I may just lift the lock out of the helper and allow this to be the responsibility of the caller, but I'll see how it looks. > > > --- a/fs/xfs/xfs_mount.c > > +++ b/fs/xfs/xfs_mount.c > > @@ -41,6 +41,7 @@ > > #include "xfs_trace.h" > > #include "xfs_icache.h" > > #include "xfs_sysfs.h" > > +#include "xfs_log_priv.h" > > > > > > static DEFINE_MUTEX(xfs_uuid_table_mutex); > > @@ -1301,3 +1302,51 @@ xfs_dev_is_read_only( > > } > > return 0; > > } > > + > > +/* > > + * Verify that an LSN stamped into a piece of metadata is valid. This is > > + * intended for use in read verifiers on v5 superblocks. > > + */ > > +void > > +xfs_detect_invalid_lsn( > > + struct xfs_mount *mp, > > + xfs_lsn_t lsn) > > +{ > > This shoul dbe called xfs_log_check_lsn() and be located in > xfs_log.c. > Ok. > > + struct xlog *log = mp->m_log; > > + int cycle = CYCLE_LSN(lsn); > > + int block = BLOCK_LSN(lsn); > > + const char *fmt; > > + > > + /* > > + * 'norecovery' mode skips mount-time log processing and unconditionally > > + * resets the LSN. > > + */ > > + if (mp->m_flags & XFS_MOUNT_NORECOVERY) > > + return; > > + > > + if (xlog_valid_lsn(mp->m_log, lsn)) > > + return; > > + > > + /* > > + * Warn at least once for each filesystem susceptible to this problem > > + * and once per day thereafter. > > + * > > + * XXX: Specify the minimum xfs_repair version required to resolve? > > + * Older versions will silently reintroduce the problem. > > + * > > + * We should eventually convert this condition into a hard error (via > > + * verifier failure). Defer that behavior for a few release cycles or so > > + * until the userspace fixes have had the opportunity to propogate > > + * throughout the various distributions. > > + */ > > + fmt = "WARNING: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). " > > +"This may result in filesystem failure. Please unmount and run xfs_repair to resolve."; > > + if (!mp->m_warned_bad_lsn) { > > + mp->m_warned_bad_lsn = true; > > + xfs_warn(mp, fmt, cycle, block, log->l_curr_cycle, > > + log->l_curr_block); > > + } else { > > + xfs_warn_daily(mp, fmt, cycle, block, log->l_curr_cycle, > > + log->l_curr_block); > > + } > > That's really ugly. Rate limited prints should always occur at least > once for each ratelimit key, and now i've seen this I can say the > first patch needs rework. > This is definitely more ugly than the shutdown approach. As mentioned, I've been waffling back and forth on how to handle this condition. I had already sent the shutdown version, so I figured I would put together what I thought would be necessary to handle this with a warning for comparison. > that is, each xfs_warn_daily() caller needs it's own ratelimit key, > rather than sharing the global ratelimit key that all other XFS > messages share. > What do you mean by a key? What is the purpose? On further thought... I think I see what you mean. Each mount should report in the ratelimited timeframe independently. Or in other words, the rate limit of one mount should not affect message delivery on another. > e.g.: > xfs_warn_daily(mp, mp->m_log->l_lsn_rlstate, > "Corruption warning: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). " > "Please unmount and run xfs_repair (>= v4.3) to resolve.". > cycle, block, log->l_curr_cycle, log->l_curr_block); > > But really, thinking on this further (the "corruption warning" bit) > I'm starting to think we should just shut the filesystem down when > we hit this and force the user to fix it immediately. if we don't, > then who knows whether we are making things worse or not... > I was originally thinking the shutdown might not be ideal because we induce the very situation we're warning about (forcing a log recovery). Since this verifies on metadata reads, however, we should in theory avoid the problem before it could ever be introduced. My follow on question to that was whether a shutdown is appropriate for something that is somewhat minor and self-corrective..? Given the ugliness of the multiple warnings, new mount flag, additional complication of the ratelimit mechanism, etc., I might just go back to the shutdown and call it a day. ;) I think the subtle point raised above about calling this a corruption is a sufficient argument. While this might be minor and self-correcting, it is still technically a corruption of sorts and should be identified and repaired appropriately. Thanks for the feedback. Brian > > --- a/fs/xfs/xfs_mount.h > > +++ b/fs/xfs/xfs_mount.h > > @@ -127,6 +127,7 @@ typedef struct xfs_mount { > > int64_t m_low_space[XFS_LOWSP_MAX]; > > /* low free space thresholds */ > > struct xfs_kobj m_kobj; > > + bool m_warned_bad_lsn;/* bad metadata lsn detected */ > > That's where I'd put the ratelimit key (or in the struct xlog). > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs