On Wed, Aug 22, 2018 at 08:17:55AM -0400, Brian Foster wrote: > On Wed, Aug 22, 2018 at 09:31:33AM +1000, Dave Chinner wrote: > > On Tue, Aug 21, 2018 at 09:21:40AM -0400, Brian Foster wrote: > > > On Mon, Aug 20, 2018 at 02:48:42PM +1000, Dave Chinner wrote: > > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > > > Currently on-disk feature checks require decoding the superblock > > > > fileds and so can be non-trivial. We have almost 400 hundred > > > > individual feature checks in the XFS code, so this is a significant > > > > amount of code. To reduce runtime check overhead, pre-process all > > > > the version flags into a features field in the xfs_mount at mount > > > > time so we can convert all the feature checks to a simple flag > > > > check. > > > > > > > > There is also a need to convert the dynamic feature flags to update > > > > the m_features field. This is required for attr, attr2 and quota > > > > features. New xfs_mount based wrappers are added for this. > > > > > > > > Before: > > > > > > > > $ size -t fs/xfs/built-in.a > > > > text data bss dec hex filename > > > > .... > > > > 1294873 182766 1036 1478675 169013 (TOTALS > > > > > > > > > > Was some text truncated from the commit log description here? Did you > > > mean to include the after size as well? > > > > Yeah, I thought I did update that. Maybe forgot to refresh the > > header once I did. The before and after are: > > > > text data bss dec hex filename > > before 1326155 189006 1036 1516197 1722a5 (TOTALS) > > after 1322929 189006 1036 1512971 17160b (TOTALS) > > > > That's a much larger delta than what I saw when I checked out of > curiousity, but I just ran against this first patch. It looks like this > delta is before/after the whole series. Yeah, it is - I couldn't find the original numbers in the scrollback history of the build machine terminal. However, ~90% of the reduction it comes from the first patch (3kB vs 3.2kB for the entire series) > It might be good to qualify that > in the commit log (i.e., "after the old xfs_sb_version_* wrappers are > removed") just because the series context isn't always clear in the > broader git log history. Yeah, I'll fix it up, put the right number in it. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > > --- > > > > fs/xfs/libxfs/xfs_format.h | 2 +- > > > > fs/xfs/libxfs/xfs_sb.c | 61 +++++++++++++++++++++++++++++ > > > > fs/xfs/libxfs/xfs_sb.h | 1 + > > > > fs/xfs/xfs_log_recover.c | 1 + > > > > fs/xfs/xfs_mount.c | 1 + > > > > fs/xfs/xfs_mount.h | 79 ++++++++++++++++++++++++++++++++++++++ > > > > 6 files changed, 144 insertions(+), 1 deletion(-) > > > > > > > ... > > > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > > > > index a21dc61ec09e..5d0438ec07dd 100644 > > > > --- a/fs/xfs/xfs_log_recover.c > > > > +++ b/fs/xfs/xfs_log_recover.c > > > > @@ -5723,6 +5723,7 @@ xlog_do_recover( > > > > xfs_buf_relse(bp); > > > > > > > > /* re-initialise in-core superblock and geometry structures */ > > > > + mp->m_features |= xfs_sb_version_to_features(sbp); > > > > > > How is this a reinit if it ORs in fields? > > > > The only feature bit that can be removed by log recovery is the > > attr2 feature bit which means the last mount was mounted with > > "noattr2". So apart from that feature bit, ORing in the new > > superblock feature mask works just fine. > > > > To be clear.. that's because attr2 is the only feature that can be > currently unset at runtime, right? Well, it's not really at runtime - it can only be cleared at mount time before log recovery is run. IMO, the attr2/noattr2 mount option stuff really just needs to go away. > > As it is, for v5 attr2 can't be turned off, so it's just fine for > > v5 filesystems. For v4 the old code would set XFS_MOUNT_ATTR2 if the > > sb feature bit was set /prior/ to log recovery being run. Hence if > > log recovery removed the feature bit, the very next attr creation > > will turn it straight back on. > > > > The only way to not have attr2 enabled in this case is to use the > > noattr2 mount option, and the new code has exactly the same > > behaviour - the attr2 superblock bit and the feature flag will get > > cleared after log recovery if the noattr2 mount option is set. > > > > Also worth noting is that if there's a sb_bad_features2 > > interatction, it will re-add the feature bit because it just ORs the > > two fields together. > > > > IOWs, the behaviour of this patch is roughly the same as the > > existing code when it comes to the attr2 flag being removed when the > > superblock is recovered as well as when there is a features2 > > mismatch. > > > > Ok, but this all sounds like a happy coincidence that depends on the > semantics of attr2. IOW, the behavior of attr2 may ultimately be the > same (which I haven't fully grokked), but unless I'm missing something > more fundamental the logic in this patch still seems slightly dynamic > feature challenged in this regard. > > Mount a filesystem with some feature enabled, disable it and crash with > the superblock change in the dirty log. Mount again, enable said feature > based on sb, log recovery updates sb, feature remains inconsistently > enabled due to not being cleared by the post-recovery feature init. > Unless I'm missing something, this is handled correctly by the existing > mechanism simply because each feature check refers to the superblock. The only way to disable it is via the noattr2 mount option. So to have the above occur, you've have to first mount with noattr2, crash after mount has logged the superblock (which happens after recovery), then mount again *without* the noattr2 mount option set. IOWs, the XFS_MOUNT_ATTR2 flag in this case is set during xfs_finish_flags() (i.e. mount option parsing) because the unrecovered superblock has the feature flag set and XFS_MOUNT_NOATTR2 is not set. All future decisions about whether to create attr2 format forks are based on XFS_MOUNT_ATTR2, not the superblock feature bit. If log recovery clears the sb feature bit, then the next attribute fork creation will see XFS_MOUNT_ATTR2 set and re-add the superblock bit. IOWs, in the crash+recover case of attr2 sb feature bit removal, the code I wrote does the same thing as the existing code - you need to use the noattr2 mount option to guarantee that attr2 is not used. IMO, this is legacy code (attr2 is permanently enabled for v5) that has nasty warts. While the noattr2 mount option was introduced right at the start in 2005, it didn't remove the superblock feature bit until more than 3 years later because the sb feature bit processing got moved by the sb_bad_features2 debacle and that broke noattr2. SO instead of just looking at the noattr2 mount option again, it was decided to clear the feature bit from the superblock. This is despite the fact there are still attr2 format inode forks on disk. i.e. the removal of the attr2 sb feature bit breaks all the conventions we have for "sb feature bit indicates a specific on disk format feature is present in the fs". The original noattr2 code did not have this problem, and as such I think we should be reverting to the original behaviour of the noattr2 mount option and stop trying to screw with the on-disk sb flag once it is set because of the above can of worms it opened. I think I'll rework the attr2 code as the initial patch in this series now, and get rid of the sb feature bit removal code altogether so this whole problem goes away. > > I think there's more work needed on the attr2 feature side of > > things, as noted in the cover description. In addition to the > > unpredictable behaviour via log recovery, I don't see a reason for > > noattr2 existing these days. It doesn't get rid of existing attr2 > > format inodes on disk - it just stops new ones from being created. > > We've used attr2 for more than 10 years now without it being an > > issue, so there's no reason for needing to turn it off anymore. I > > think we should deprecate the option and remove it. > > > > > I guess I'm curious why we OR > > > in fields in either case as opposed to using an assignment. > > > > Because mount option features are already set in m_features, so we > > can't just overwrite it with just the superblock features. > > That doesn't appear to be the case as of this patch. If it's a factor > later in the series, we should tweak it then where the intent is clear. Patches don't exist in isolation. Please look at the work as a whole, not just patches in isolation. > This also doesn't strike me as a technically difficult problem to > address. We just need to filter out the set of superblock based features > one way or another. If you wanted to simplify it further, we could just > have the sb -> features function update ->m_features itself in a safe > manner (i.e., the subset of features it is responsible for) and then the > caller context doesn't really have to be concerned with such details. We shouldn't be clearing superblock feature bits while there are still those features on disk. Only attr2 does this, and as per above, it's broken. I'm going to remove the code that removes the attr2 feature bit in the next round of the patch, and then this whole problem goes away. > > > > > > > > xfs_reinit_percpu_counters(mp); > > > > error = xfs_initialize_perag(mp, sbp->sb_agcount, &mp->m_maxagi); > > > > if (error) { > > > ... > > > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > > > > index 7964513c3128..92d947f17c69 100644 > > > > --- a/fs/xfs/xfs_mount.h > > > > +++ b/fs/xfs/xfs_mount.h > > > > @@ -127,6 +127,7 @@ typedef struct xfs_mount { > > > > struct mutex m_growlock; /* growfs mutex */ > > > > int m_fixedfsid[2]; /* unchanged for life of FS */ > > > > uint64_t m_flags; /* global mount flags */ > > > > + uint64_t m_features; /* active filesystem features */ > > > > bool m_inotbt_nores; /* no per-AG finobt resv. */ > > > > int m_ialloc_inos; /* inodes in inode allocation */ > > > > int m_ialloc_blks; /* blocks in inode allocation */ > > > > @@ -195,6 +196,84 @@ typedef struct xfs_mount { > > > > #endif > > > > } xfs_mount_t; > > > > > > > > +/* > > > > + * Flags for m_features. > > > > + * > > > > + * These are all the active features in the filesystem, regardless of how > > > > + * they are configured. > > Given the point above around mount options, I think the comment above > would be more helpful if it said something like: > > "These are all the active features in the filesystem. Some are > configured based on superblock version, others are based on mount > options." Bigger picture: what about options we may set through, say, /proc, /sys, ioctls, online repair, etc? I think that the the places and ways we can set feature flags is only going to grow in future, and hence I'd prefer to leave this as a generic comment that encompasses everything rather than have it grow stale over time. > > > I don't want to get into bikeshedding too much but tbh I've always found > > > the xfs_sb_has_* thing kind of weird where the "has" text seems > > > superfluous. It's just not been worth changing. It would be nice if we > > > could stop propagating it here and define a consistently used prefix. > > > > Yet we use "is" all over the place when checking if an object is > > <something> (e.g. xfs_btree_ptr_is_null(), xfs_iext_rec_is_empty(), > > xfs_rmap_is_mergeable(), etc) because it makes the code more > > readable. The old sbversion namespace format is the problem, not the > > use of "has" to indicate we are checking if the filesystem has a > > feature... > > > > Each of the examples you cited above have widely used function name > prefixes. The prefix indicates the subsystem the operation belongs to. However, filesystem features are, well, belong in the global context. They span all subsystems, and are a property of the global mount structure. IOWs, They don't have a single widely used subsystem that can be used as a namespace prefix, and using "feature" because "we must have a namespace prefix" turns all the function names into tautologies. Consider this: why is xfs_is_read_only(mp) an acceptible function name for checking global filesystem state, but xfs_has_attr2(mp) is not acceptible for checking global filesystem state? > There's a reason they are named as they are and not as > xfs_is_empty_iext_rec(), for example. That's what I'm asking for here. > I don't fundamentally object to the fact that we use "has" or "is." I'm > merely pointing out that I think "has" is superfluous with respect to a > properly used function prefix and therefore we could replace it with the > prefix used by the other related helpers, restoring consistency without > sacrificing readability. > > IOW, if you wanted to rename these to something like > xfs_feat_crc_enablement_it_has(mp), that wouldn't exactly be my > preference... but I won't object if we can address the function prefix > thing. ;P > > > I omitted the "_feat_" part of the name because it's effectively > > redundant when you read it. i.e. if (xfs_has_pquotaino(mp)) reads as > > a well composed sentence: "if XFS has project quotas inodes on this > > mount". Doing s/has/feature/ does not improve the readbility of the > > code, just makes it unnecessarily verbose. > > > > Eh, I don't think anybody is going to be confused by > xfs_has_pquotaino(mp) vs. xfs_feat_pquotaino(mp). Either way is I find it confusing. Is it checking if the feature is enabled? Is it manipulating the feature in some way? Is it checking if we actually have a project quota inode present on the xfs_mount? Or maybe something else? i.e. There's no action defined or implied by the function name and so I can't tell what that function is doing without looking at it. That's what "has" or "is" adds to the function name: a definitive action. i.e. it's not the namespace prefix that is meaningful here - it's the action in the name ("has") that makes it obvious what the function does. > self-descriptive and obvious IMO. As noted, I'm not really looking to > bikeshed over the most clear way to say "if this feature is enabled" in > a function name. I'd just prefer to use the associated function prefix > as we do most everywhere else. And that's the issue I have here - we must do exactly what we've done in the past, even though those rules are falling down around us. Look at the mess this blind obedience to namespace rules got the scrub code into; it was ending up an unreadable jumble of noise because every structure and function had 25+ character namespace prefixes you had to read on every line of code before you get to the meaningful information. Namespacing structures and functions have their place, but there is such a thing as taking it too far. Stuff that is easily understandable without verbose namespace prefixes should not have verbose/redundant namespace prefixes. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx