Re: [RFC[RAP] PATCH] xfs: allow setting and clearing of log incompat feature flags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 08, 2020 at 06:19:06AM -0500, Brian Foster wrote:
> On Mon, Dec 07, 2020 at 04:40:28PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > Log incompat feature flags in the superblock exist for one purpose: to
> > protect the contents of a dirty log from replay on a kernel that isn't
> > prepared to handle those dirty contents.  This means that they can be
> > cleared if (a) we know the log is clean and (b) we know that there
> > aren't any other threads in the system that might be setting or relying
> > upon a log incompat flag.
> > 
> > Therefore, clear the log incompat flags when we've finished recovering
> > the log, when we're unmounting cleanly, remounting read-only, or
> > freezing; and provide a function so that subsequent patches can start
> > using this.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > ---
> > Note: I wrote this so that we could turn on log incompat flags for
> > atomic extent swapping and Allison could probably use it for the delayed
> > logged xattr patchset.  Not gonna try to land this in 5.11, FWIW...
> > ---
> >  fs/xfs/libxfs/xfs_format.h |   15 ++++++
> >  fs/xfs/xfs_log.c           |    8 +++
> >  fs/xfs/xfs_mount.c         |  119 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_mount.h         |    2 +
> >  fs/xfs/xfs_super.c         |    9 +++
> >  5 files changed, 153 insertions(+)
> > 
> ...
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 669a9cf43582..68a5c90ec65b 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> ...
> > @@ -1361,6 +1369,117 @@ xfs_force_summary_recalc(
> >  	xfs_fs_mark_checked(mp, XFS_SICK_FS_COUNTERS);
> >  }
> >  
> > +/*
> > + * Enable a log incompat feature flag in the primary superblock.  The caller
> > + * cannot have any other transactions in progress.
> > + */
> > +int
> > +xfs_add_incompat_log_feature(
> > +	struct xfs_mount	*mp,
> > +	uint32_t		feature)
> > +{
> > +	struct xfs_dsb		*dsb;
> > +	int			error;
> > +
> > +	ASSERT(hweight32(feature) == 1);
> > +	ASSERT(!(feature & XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN));
> > +
> > +	/*
> > +	 * Force the log to disk and kick the background AIL thread to reduce
> > +	 * the chances that the bwrite will stall waiting for the AIL to unpin
> > +	 * the primary superblock buffer.  This isn't a data integrity
> > +	 * operation, so we don't need a synchronous push.
> > +	 */
> > +	error = xfs_log_force(mp, XFS_LOG_SYNC);
> > +	if (error)
> > +		return error;
> > +	xfs_ail_push_all(mp->m_ail);
> > +
> > +	/*
> > +	 * Lock the primary superblock buffer to serialize all callers that
> > +	 * are trying to set feature bits.
> > +	 */
> > +	xfs_buf_lock(mp->m_sb_bp);
> > +	xfs_buf_hold(mp->m_sb_bp);
> > +
> > +	if (XFS_FORCED_SHUTDOWN(mp)) {
> > +		error = -EIO;
> > +		goto rele;
> > +	}
> > +
> > +	if (xfs_sb_has_incompat_log_feature(&mp->m_sb, feature))
> > +		goto rele;
> > +
> > +	/*
> > +	 * Write the primary superblock to disk immediately, because we need
> > +	 * the log_incompat bit to be set in the primary super now to protect
> > +	 * the log items that we're going to commit later.
> > +	 */
> > +	dsb = mp->m_sb_bp->b_addr;
> > +	xfs_sb_to_disk(dsb, &mp->m_sb);
> > +	dsb->sb_features_log_incompat |= cpu_to_be32(feature);
> > +	error = xfs_bwrite(mp->m_sb_bp);
> > +	if (error)
> > +		goto shutdown;
> > +
> 
> Do we need to do all of this serialization and whatnot to set the
> incompat bit? Can we just modify and log the in-core sb when or before
> an incompatible item is logged, similar to how xfs_sbversion_add_attr2()
> is implemented for example?

Hm.  The goal here is to ensure that the primary super gets updated
before anything starts logging the protected log items.

The first iteration of this patch simply updated the incore superblock
and called xfs_sync_sb(mp, true) to flush it out to the primary super.
This proved to be the source of log livelocks because that function
bholds the primary super and pushes the log and AIL, which sometimes
stalled out because the AIL can't get the lock on the primary super
buffer.

The second version did the bare bwrite, but suffered from the
xfs_buf_lock occasionally stalling for 30s while waiting for a log flush
or something.  This third version forces the log and pushes the AIL
prior to adding the feature bit to speed things up.  There's a small
window between the xfs_buf_relse and the end of xfs_sync_sb where
another thread could start logging items before we add the super update
to the log stream, but the bwrite is supposed to cover that case. :)

The difficulty here is that any other threads that think they might need
to set the log_incompat bit have to be held off until this thread
confirms that the primary super has been updated on disk, because the
three outcomes we want here are (a) feature already set, just go; (b)
feature not set, but by the time you get the locks it's already taken
care of; or (c) feature not set, and you get to do all the updates.

We can't do that with the primary super buffer lock alone (because we
can't bhold it and push the AIL at the same time) but I guess I could
just add a "feature update" mutex into the mix.

> > +	/*
> > +	 * Add the feature bits to the incore superblock before we unlock the
> > +	 * buffer.
> > +	 */
> > +	xfs_sb_add_incompat_log_features(&mp->m_sb, feature);
> > +	xfs_buf_relse(mp->m_sb_bp);
> > +
> > +	/* Log the superblock to disk. */
> > +	return xfs_sync_sb(mp, false);
> > +shutdown:
> > +	xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
> > +rele:
> > +	xfs_buf_relse(mp->m_sb_bp);
> > +	return error;
> > +}
> > +
> > +/*
> > + * Clear all the log incompat flags from the superblock.
> > + *
> > + * The caller must have freeze protection, cannot have any other transactions
> > + * in progress, must ensure that the log does not contain any log items
> > + * protected by any log incompat bit, and must ensure that there are no other
> > + * threads that could be reading or writing the log incompat feature flag in
> > + * the primary super.  In other words, we can only turn features off as one of
> > + * the final steps of quiescing or unmounting the log.
> > + */
> > +int
> > +xfs_clear_incompat_log_features(
> > +	struct xfs_mount	*mp)
> > +{
> > +	if (!xfs_sb_version_hascrc(&mp->m_sb) ||
> > +	    !xfs_sb_has_incompat_log_feature(&mp->m_sb,
> > +				XFS_SB_FEAT_INCOMPAT_LOG_ALL) ||
> > +	    XFS_FORCED_SHUTDOWN(mp))
> > +		return 0;
> > +
> > +	/*
> > +	 * Update the incore superblock.  We synchronize on the primary super
> > +	 * buffer lock to be consistent with the add function, though at least
> > +	 * in theory this shouldn't be necessary.
> > +	 */
> > +	xfs_buf_lock(mp->m_sb_bp);
> > +	xfs_buf_hold(mp->m_sb_bp);
> > +
> > +	if (!xfs_sb_has_incompat_log_feature(&mp->m_sb,
> > +				XFS_SB_FEAT_INCOMPAT_LOG_ALL))
> > +		goto rele;
> > +
> > +	xfs_sb_remove_incompat_log_features(&mp->m_sb);
> > +	xfs_buf_relse(mp->m_sb_bp);
> > +
> > +	/* Log the superblock to disk. */
> > +	return xfs_sync_sb(mp, false);
> 
> Ok, so here it looks like we log the superblock change and presumably
> rely on the unmount sequence to write it back...
> 
> > +rele:
> > +	xfs_buf_relse(mp->m_sb_bp);
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Update the in-core delayed block counter.
> >   *
> ...
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 343435a677f9..71e304c15e6b 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -919,6 +919,15 @@ xfs_quiesce_attr(
> >  	/* force the log to unpin objects from the now complete transactions */
> >  	xfs_log_force(mp, XFS_LOG_SYNC);
> >  
> > +	/*
> > +	 * Clear log incompat features since we're freezing or remounting ro.
> > +	 * Report failures, though it's not fatal to have a higher log feature
> > +	 * protection level than the log contents actually require.
> > +	 */
> > +	error = xfs_clear_incompat_log_features(mp);
> > +	if (error)
> > +		xfs_warn(mp, "Unable to clear superblock log incompat flags. "
> > +				"Frozen image may not be consistent.");
> >  
> 
> What happens if the modified superblock is written back and then we
> crash during a quiesce but before the log is fully clean? ISTM we could
> end up in recovery of a log with potentially incompatible log items
> where the feature bit has already been cleared. Not sure that's a
> problem with intents, but seems risky in general.

I think you're right that this isn't generally correct.  It works for
intent items because those are all started by front end callers and
those are guaranteed not to be running transactions by the time we get
to xfs_quiesce_attr... but that wouldn't necessarily hold true for other
types of log items that we could create.

> Perhaps this needs to be pushed further down such that similar to
> unmount, we ensure a full/sync AIL push completes (and moves the in-core
> tail) before we log the feature bit change. I do wonder if it's worth
> complicating the log quiesce path to clear feature bits at all, but I
> suppose it could be a little inconsistent to clean the log on freeze yet
> leave an incompat bit around. Perhaps we should push the clear bit
> sequence down into the log quiesce path between completing the AIL push
> and writing the unmount record. We may have to commit a sync transaction
> and then push the AIL again, but that would cover the unmount and freeze
> cases and I think we could probably do away with the post-recovery bit
> clearing case entirely. A current/recovered mount should clear the
> associated bits on the next log quiesce anyways. Hm?

Hm.  You know how xfs_log_quiesce takes and releases the superblock
buffer lock after pushing the AIL but before writing the unmount record?
What if we did the log_incompat feature clearing + bwrite right after
that?

--D

> 
> Brian
> 
> >  	/* Push the superblock and write an unmount record */
> >  	error = xfs_log_sbcount(mp);
> > 
> 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux