Re: [PATCH 01/10] xfs: reflect sb features in xfs_mount

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 23 Aug 2018 09:39:00 -0400

On Thu, Aug 23, 2018 at 09:59:13AM +1000, Dave Chinner wrote:
> On Wed, Aug 22, 2018 at 08:17:55AM -0400, Brian Foster wrote:
> > On Wed, Aug 22, 2018 at 09:31:33AM +1000, Dave Chinner wrote:
> > > On Tue, Aug 21, 2018 at 09:21:40AM -0400, Brian Foster wrote:
> > > > On Mon, Aug 20, 2018 at 02:48:42PM +1000, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > > 
> > > > > Currently on-disk feature checks require decoding the superblock
> > > > > fileds and so can be non-trivial. We have almost 400 hundred
> > > > > individual feature checks in the XFS code, so this is a significant
> > > > > amount of code. To reduce runtime check overhead, pre-process all
> > > > > the version flags into a features field in the xfs_mount at mount
> > > > > time so we can convert all the feature checks to a simple flag
> > > > > check.
> > > > > 
> > > > > There is also a need to convert the dynamic feature flags to update
> > > > > the m_features field. This is required for attr, attr2 and quota
> > > > > features. New xfs_mount based wrappers are added for this.
> > > > > 
> > > > > Before:
> > > > > 
> > > > > $ size -t fs/xfs/built-in.a
> > > > >    text	   data	    bss	    dec	    hex	filename
> > > > > ....
> > > > > 1294873	 182766	   1036	1478675	 169013	(TOTALS
> > > > > 
> > > > 
> > > > Was some text truncated from the commit log description here? Did you
> > > > mean to include the after size as well?
> > > 
> > > Yeah, I thought I did update that. Maybe forgot to refresh the
> > > header once I did. The before and after are:
> > > 
> > > 	text	   data	    bss	    dec	    hex	filename
> > > before	1326155	 189006	   1036	1516197	 1722a5	(TOTALS)
> > > after	1322929	 189006	   1036	1512971	 17160b	(TOTALS)
> > > 
> > 
> > That's a much larger delta than what I saw when I checked out of
> > curiousity, but I just ran against this first patch. It looks like this
> > delta is before/after the whole series.
> 
> Yeah, it is - I couldn't find the original numbers in the scrollback
> history of the build machine terminal. However, ~90% of the reduction
> it comes from the first patch (3kB vs 3.2kB for the entire series)
> 
> > It might be good to qualify that
> > in the commit log (i.e., "after the old xfs_sb_version_* wrappers are
> > removed") just because the series context isn't always clear in the
> > broader git log history.
> 
> Yeah, I'll fix it up, put the right number in it.
> 
> > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > > ---
> > > > >  fs/xfs/libxfs/xfs_format.h |  2 +-
> > > > >  fs/xfs/libxfs/xfs_sb.c     | 61 +++++++++++++++++++++++++++++
> > > > >  fs/xfs/libxfs/xfs_sb.h     |  1 +
> > > > >  fs/xfs/xfs_log_recover.c   |  1 +
> > > > >  fs/xfs/xfs_mount.c         |  1 +
> > > > >  fs/xfs/xfs_mount.h         | 79 ++++++++++++++++++++++++++++++++++++++
> > > > >  6 files changed, 144 insertions(+), 1 deletion(-)
> > > > > 
> > > > ...
> > > > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > > > index a21dc61ec09e..5d0438ec07dd 100644
> > > > > --- a/fs/xfs/xfs_log_recover.c
> > > > > +++ b/fs/xfs/xfs_log_recover.c
> > > > > @@ -5723,6 +5723,7 @@ xlog_do_recover(
> > > > >  	xfs_buf_relse(bp);
> > > > >  
> > > > >  	/* re-initialise in-core superblock and geometry structures */
> > > > > +	mp->m_features |= xfs_sb_version_to_features(sbp);
> > > > 
> > > > How is this a reinit if it ORs in fields?
> > > 
> > > The only feature bit that can be removed by log recovery is the
> > > attr2 feature bit which means the last mount was mounted with
> > > "noattr2". So apart from that feature bit, ORing in the new
> > > superblock feature mask works just fine.
> > > 
> > 
> > To be clear.. that's because attr2 is the only feature that can be
> > currently unset at runtime, right?
> 
> Well, it's not really at runtime - it can only be cleared at mount
> time before log recovery is run.  IMO, the attr2/noattr2 mount
> option stuff really just needs to go away.
> 
> > > As it is, for v5 attr2 can't be turned off, so it's just fine for
> > > v5 filesystems. For v4 the old code would set XFS_MOUNT_ATTR2 if the
> > > sb feature bit was set /prior/ to log recovery being run. Hence if
> > > log recovery removed the feature bit, the very next attr creation
> > > will turn it straight back on.
> > > 
> > > The only way to not have attr2 enabled in this case is to use the
> > > noattr2 mount option, and the new code has exactly the same
> > > behaviour - the attr2 superblock bit and the feature flag will get
> > > cleared after log recovery if the noattr2 mount option is set.
> > > 
> > > Also worth noting is that if there's a sb_bad_features2
> > > interatction, it will re-add the feature bit because it just ORs the
> > > two fields together.
> > > 
> > > IOWs, the behaviour of this patch is roughly the same as the
> > > existing code when it comes to the attr2 flag being removed when the
> > > superblock is recovered as well as when there is a features2
> > > mismatch.
> > > 
> > 
> > Ok, but this all sounds like a happy coincidence that depends on the
> > semantics of attr2. IOW, the behavior of attr2 may ultimately be the
> > same (which I haven't fully grokked), but unless I'm missing something
> > more fundamental the logic in this patch still seems slightly dynamic
> > feature challenged in this regard.
> > 
> > Mount a filesystem with some feature enabled, disable it and crash with
> > the superblock change in the dirty log. Mount again, enable said feature
> > based on sb, log recovery updates sb, feature remains inconsistently
> > enabled due to not being cleared by the post-recovery feature init.
> > Unless I'm missing something, this is handled correctly by the existing
> > mechanism simply because each feature check refers to the superblock.
> 
> The only way to disable it is via the noattr2 mount option. So to
> have the above occur, you've have to first mount with noattr2, crash
> after mount has logged the superblock (which happens after
> recovery), then mount again *without* the noattr2 mount option set.
> 
> IOWs, the XFS_MOUNT_ATTR2 flag in this case is set during
> xfs_finish_flags() (i.e. mount option parsing) because the
> unrecovered superblock has the feature flag set and
> XFS_MOUNT_NOATTR2 is not set. All future decisions about whether to
> create attr2 format forks are based on XFS_MOUNT_ATTR2, not the
> superblock feature bit. If log recovery clears the sb feature bit,
> then the next attribute fork creation will see XFS_MOUNT_ATTR2 set
> and re-add the superblock bit.
> 

Ok, I think I was confused as to how the existing feature bit worked. It
sounds like the bit is more driven by the mount option.

I still don't think that the mechanism should be susceptible to
inconsistency in general, even if it's not a real problem based on the
current usage (i.e., still a landmine). If the solution to that is to
drop the ability to dynamically remove a superblock feature, then it
sounds like that addresses the problem just as well. I'll wait and see
what the code looks like...

> IOWs, in the crash+recover case of attr2 sb feature bit removal, the
> code I wrote does the same thing as the existing code - you need to
> use the noattr2 mount option to guarantee that attr2 is not used.
> 
> IMO, this is legacy code (attr2 is permanently enabled for v5) that
> has nasty warts. While the noattr2 mount option was introduced right
> at the start in 2005, it didn't remove the superblock feature bit
> until more than 3 years later because the sb feature bit processing
> got moved by the sb_bad_features2 debacle and that broke noattr2. SO
> instead of just looking at the noattr2 mount option again, it was
> decided to clear the feature bit from the superblock. This is
> despite the fact there are still attr2 format inode forks on disk.
> 
> i.e. the removal of the attr2 sb feature bit breaks all the
> conventions we have for "sb feature bit indicates a specific on disk
> format feature is present in the fs". The original noattr2 code did
> not have this problem, and as such I think we should be reverting to
> the original behaviour of the noattr2 mount option and stop trying
> to screw with the on-disk sb flag once it is set because of the
> above can of worms it opened.
> 
> I think I'll rework the attr2 code as the initial patch in this
> series now, and get rid of the sb feature bit removal code
> altogether so this whole problem goes away.
> 

Ok.

> > > I think there's more work needed on the attr2 feature side of
> > > things, as noted in the cover description. In addition to the
> > > unpredictable behaviour via log recovery, I don't see a reason for
> > > noattr2 existing these days. It doesn't get rid of existing attr2
> > > format inodes on disk - it just stops new ones from being created.
> > > We've used attr2 for more than 10 years now without it being an
> > > issue, so there's no reason for needing to turn it off anymore. I
> > > think we should deprecate the option and remove it.
> > > 
> > > > I guess I'm curious why we OR
> > > > in fields in either case as opposed to using an assignment.
> > > 
> > > Because mount option features are already set in m_features, so we
> > > can't just overwrite it with just the superblock features.
> > 
> > That doesn't appear to be the case as of this patch. If it's a factor
> > later in the series, we should tweak it then where the intent is clear.
> 
> Patches don't exist in isolation. Please look at the work as a
> whole, not just patches in isolation.
> 

This is a simple matter of ordering changes in a series to facilitate
review.

> > This also doesn't strike me as a technically difficult problem to
> > address. We just need to filter out the set of superblock based features
> > one way or another. If you wanted to simplify it further, we could just
> > have the sb -> features function update ->m_features itself in a safe
> > manner (i.e., the subset of features it is responsible for) and then the
> > caller context doesn't really have to be concerned with such details.
> 
> We shouldn't be clearing superblock feature bits while there are
> still those features on disk. Only attr2 does this, and as per
> above, it's broken. I'm going to remove the code that removes the
> attr2 feature bit in the next round of the patch, and then this
> whole problem goes away.
> 
> > 
> > > > 
> > > > >  	xfs_reinit_percpu_counters(mp);
> > > > >  	error = xfs_initialize_perag(mp, sbp->sb_agcount, &mp->m_maxagi);
> > > > >  	if (error) {
> > > > ...
> > > > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > > > > index 7964513c3128..92d947f17c69 100644
> > > > > --- a/fs/xfs/xfs_mount.h
> > > > > +++ b/fs/xfs/xfs_mount.h
> > > > > @@ -127,6 +127,7 @@ typedef struct xfs_mount {
> > > > >  	struct mutex		m_growlock;	/* growfs mutex */
> > > > >  	int			m_fixedfsid[2];	/* unchanged for life of FS */
> > > > >  	uint64_t		m_flags;	/* global mount flags */
> > > > > +	uint64_t		m_features;	/* active filesystem features */
> > > > >  	bool			m_inotbt_nores; /* no per-AG finobt resv. */
> > > > >  	int			m_ialloc_inos;	/* inodes in inode allocation */
> > > > >  	int			m_ialloc_blks;	/* blocks in inode allocation */
> > > > > @@ -195,6 +196,84 @@ typedef struct xfs_mount {
> > > > >  #endif
> > > > >  } xfs_mount_t;
> > > > >  
> > > > > +/*
> > > > > + * Flags for m_features.
> > > > > + *
> > > > > + * These are all the active features in the filesystem, regardless of how
> > > > > + * they are configured.
> > 
> > Given the point above around mount options, I think the comment above
> > would be more helpful if it said something like:
> > 
> > "These are all the active features in the filesystem. Some are
> > configured based on superblock version, others are based on mount
> > options."
> 
> Bigger picture: what about options we may set through, say, /proc,
> /sys, ioctls, online repair, etc? I think that the the places and
> ways we can set feature flags is only going to grow in future, and
> hence I'd prefer to leave this as a generic comment that encompasses
> everything rather than have it grow stale over time.
> 
> > > > I don't want to get into bikeshedding too much but tbh I've always found
> > > > the xfs_sb_has_* thing kind of weird where the "has" text seems
> > > > superfluous. It's just not been worth changing. It would be nice if we
> > > > could stop propagating it here and define a consistently used prefix.
> > > 
> > > Yet we use "is" all over the place when checking if an object is
> > > <something> (e.g. xfs_btree_ptr_is_null(), xfs_iext_rec_is_empty(),
> > > xfs_rmap_is_mergeable(), etc) because it makes the code more
> > > readable. The old sbversion namespace format is the problem, not the
> > > use of "has" to indicate we are checking if the filesystem has a
> > > feature...
> > > 
> > 
> > Each of the examples you cited above have widely used function name
> > prefixes.
> 
> The prefix indicates the subsystem the operation belongs to.
> 
> However, filesystem features are, well, belong in the global
> context.  They span all subsystems, and are a property of the global
> mount structure. IOWs, They don't have a single widely used
> subsystem that can be used as a namespace prefix, and using
> "feature" because "we must have a namespace prefix" turns all the
> function names into tautologies.
> 

Hmm, I've not introduced the requirement for a prefix here. One was
already used and it is renamed to "xfs_feat_" by your patches. I'm just
asking that you use it consistently. ;)

FWIW, I do think it's reasonable to retain the prefix because this set
of functions is clearly related as they're now generated by the same
macro infrastructure. This may not be a subsystem or whatever, but the
functional relation is obvious.

> Consider this: why is xfs_is_read_only(mp) an acceptible function
> name for checking global filesystem state, but xfs_has_attr2(mp) is
> not acceptible for checking global filesystem state?
> 

Because it's a standalone helper and read_only is an obvious state to an
fs developer. Would xfs_is_awesome() be just as clear? Would
xfs_read_only() really be more confusing?

If you defined xfs_is_readonly(), xfs_set_readonly() and
xfs_clear_readonly(), I might have a slightly different opinion. I say
slightly because that's still a fairly narrow usage for the set of
functions (read-only) as opposed to a more broad/generic mechanism for
the set of all features. It's easier to track the former without a
prefix IMO.

Some additional questions might be.. why did we have a common prefix in
the first place (and drop it in one case here)? Or why do the other two
related functions still share a prefix? I almost find the inconsistent
use of the renamed prefix more strange than not having one at all.

> > There's a reason they are named as they are and not as
> > xfs_is_empty_iext_rec(), for example. That's what I'm asking for here.
> > I don't fundamentally object to the fact that we use "has" or "is." I'm
> > merely pointing out that I think "has" is superfluous with respect to a
> > properly used function prefix and therefore we could replace it with the
> > prefix used by the other related helpers, restoring consistency without
> > sacrificing readability.
> > 
> > IOW, if you wanted to rename these to something like
> > xfs_feat_crc_enablement_it_has(mp), that wouldn't exactly be my
> > preference...  but I won't object if we can address the function prefix
> > thing. ;P
> > 
> > > I omitted the "_feat_" part of the name because it's effectively
> > > redundant when you read it. i.e. if (xfs_has_pquotaino(mp)) reads as
> > > a well composed sentence: "if XFS has project quotas inodes on this
> > > mount". Doing s/has/feature/ does not improve the readbility of the
> > > code, just makes it unnecessarily verbose.
> > > 
> > 
> > Eh, I don't think anybody is going to be confused by
> > xfs_has_pquotaino(mp) vs. xfs_feat_pquotaino(mp). Either way is
> 
> I find it confusing. Is it checking if the feature is enabled? Is it
> manipulating the feature in some way? Is it checking if we actually
> have a project quota inode present on the xfs_mount? Or maybe
> something else?
> 

"Is it checking if we actually _have_ a project quota inode present
...?" (emphasis mine)

Hmm, really? So the argument is that xfs_has_pquotino() somehow
clarifies that? :P

I think that's just preferential analysis. I really don't see how you
could confuse something like 'if (xfs_feat_pquotaino(mp))' with checking
for a quota inode or whatever but not acknowledge that 'if
(xfs_has_pquotaino(mp))' might be just as confusing to somebody else.
They really are equally terse (which I think is fine, btw) and just as
vague for somebody who might be looking at XFS for the first time (who
then takes the 5 seconds to check the function and it's moot).

> i.e. There's no action defined or implied by the function name and so
> I can't tell what that function is doing without looking at it.
> That's what "has" or "is" adds to the function name: a definitive
> action. i.e.  it's not the namespace prefix that is meaningful here
> - it's the action in the name ("has") that makes it obvious what the
> function does.
> 

I don't think either name makes it truly obvious what the function does.
It's the context and trivial implementation that makes it obvious what
the function does, to the point where (IMO) any argument around the
clarity of one of xfs_feat_crc() or xfs_has_crc() or XFS_CRC() or
whatever other (shortened and reasonably related) pattern you can think
of over the others is mostly spurious.

Basically, I think this whole line of argument is moot. If we want to
prioritize the self-descriptiveness of the name, then don't shorten it
to the point where we're trying to rely on whether "has" has some
special clarifying value. The current functions are
xfs_sb_version_[has|add|remove]<feature>(). It's clear what they do and
how they're related. If "has" is that important, then
xfs_feat_[has|add|remove]*() should work just as well.

> > self-descriptive and obvious IMO. As noted, I'm not really looking to
> > bikeshed over the most clear way to say "if this feature is enabled" in
> > a function name. I'd just prefer to use the associated function prefix
> > as we do most everywhere else.
> 
> And that's the issue I have here - we must do exactly what we've
> done in the past, even though those rules are falling down around
> us. Look at the mess this blind obedience to namespace rules got the
> scrub code into; it was ending up an unreadable jumble of noise
> because every structure and function had 25+ character namespace
> prefixes you had to read on every line of code before you get to the
> meaningful information.
> 
> Namespacing structures and functions have their place, but there is
> such a thing as taking it too far. Stuff that is easily
> understandable without verbose namespace prefixes should not have
> verbose/redundant namespace prefixes. 
> 

I think you're amplifying my fairly minor request into something more
involved than it really is. For one, the fact that some other subsystem
had too large a namespace or too descriptive a prefix doesn't make it a
problem here. Further and as noted above, a prefix for these functions
existed before these patches and one still exists after.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx