Re: [RFC[RAP] PATCH] xfs: allow setting and clearing of log incompat feature flags

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 9 Dec 2020 15:19:50 +1100

On Tue, Dec 08, 2020 at 07:26:24PM -0800, Darrick J. Wong wrote:
> On Tue, Dec 08, 2020 at 02:19:13PM -0500, Brian Foster wrote:
> > On Tue, Dec 08, 2020 at 10:10:27AM -0800, Darrick J. Wong wrote:
> > > On Tue, Dec 08, 2020 at 06:19:06AM -0500, Brian Foster wrote:
> > > > On Mon, Dec 07, 2020 at 04:40:28PM -0800, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > > 
> > > > > Log incompat feature flags in the superblock exist for one purpose: to
> > > > > protect the contents of a dirty log from replay on a kernel that isn't
> > > > > prepared to handle those dirty contents.  This means that they can be
> > > > > cleared if (a) we know the log is clean and (b) we know that there
> > > > > aren't any other threads in the system that might be setting or relying
> > > > > upon a log incompat flag.
> > > > > 
> > > > > Therefore, clear the log incompat flags when we've finished recovering
> > > > > the log, when we're unmounting cleanly, remounting read-only, or
> > > > > freezing; and provide a function so that subsequent patches can start
> > > > > using this.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > > ---
> > > > > Note: I wrote this so that we could turn on log incompat flags for
> > > > > atomic extent swapping and Allison could probably use it for the delayed
> > > > > logged xattr patchset.  Not gonna try to land this in 5.11, FWIW...
> > > > > ---
....
> > > > unmount, we ensure a full/sync AIL push completes (and moves the in-core
> > > > tail) before we log the feature bit change. I do wonder if it's worth
> > > > complicating the log quiesce path to clear feature bits at all, but I
> > > > suppose it could be a little inconsistent to clean the log on freeze yet
> > > > leave an incompat bit around. Perhaps we should push the clear bit
> > > > sequence down into the log quiesce path between completing the AIL push
> > > > and writing the unmount record. We may have to commit a sync transaction
> > > > and then push the AIL again, but that would cover the unmount and freeze
> > > > cases and I think we could probably do away with the post-recovery bit
> > > > clearing case entirely. A current/recovered mount should clear the
> > > > associated bits on the next log quiesce anyways. Hm?
> > > 
> > > Hm.  You know how xfs_log_quiesce takes and releases the superblock
> > > buffer lock after pushing the AIL but before writing the unmount record?
> > > What if we did the log_incompat feature clearing + bwrite right after
> > > that?
> > > 
> > 
> > Yeah, that's where I was thinking for the unmount side. I'm not sure we
> > want to just bwrite there vs. log+push though. Otherwise, I think
> > there's still a window of potential inconsistency between when the
> > superblock write completes and the unmount record lands on disk.

The only thing the unmount record really does is ensure that the
head and tail of the log point to the unmount record. that's what
marks the log clean, not the unmount record itself. We use a special
record here because we're not actually modifying anything - it's not
a transaction, just a marker to get the log head + tail to point to
the same LSN in the log.

Log covering does the same thing via logging and flushing the
superblock multiple times - it makes sure that the only item in the
commit at the head of the journal has a tail pointer that also
points to the same checkpoint. The fact that it's got a "dirty"
unmodified superblock in rather than being an unmount record is
largely irrelevant - the rest of the log has been marked as clean by
this checkpoint and the replay of the SB from this checkpoint is
effectively a no-op.

In fact, I think it's a "no-op" we can take advantage of.....

> > I
> > _think_ the whole log force -> AIL push (i.e. move log tail) -> update
> > super -> log force -> AIL push sequence ensures that if an older kernel
> > saw the updated super, the log would technically be dirty but we'd at
> > least be sure that all incompat log items are behind the tail.

It's a bit more complex than that - if the log is not dirty, then
the second log force does nothing, and the tail of the log does not
get updated. Hence you have to log, commit and writeback the
superblock twice to ensure that the log tail has been updated in the
journal to after the point in time where the superblock was *first
logged*.

The log covering state machine handles this all correctly, and it
does it using the superblock as the mechanism that moves the tail of
the log forward. I suspect that we could use the second logging of
the superblock in that state machine to clear/set log incompat
flags, as that is the commit that marks the log "empty". If we crash
before the sb write at this point, log recovery will only see the
superblock to recover, so it doesn't matter that there's a change in
log incompat bits here because after recovery we're going to clear
them, right?

I'd like to avoid re-inventing the wheel here if we can :)

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx