Re: [PATCH v2 0/4] xfs: fix v5 AGFL wrapping

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Thu, 22 Feb 2018 10:06:28 -0800

On Thu, Feb 22, 2018 at 06:55:03AM -0500, Brian Foster wrote:
> On Wed, Feb 21, 2018 at 09:35:46AM -0800, Darrick J. Wong wrote:
> > On Wed, Feb 21, 2018 at 08:39:16AM -0500, Brian Foster wrote:
> > > On Tue, Feb 20, 2018 at 09:08:30AM -0800, Darrick J. Wong wrote:
> > > > On Mon, Feb 19, 2018 at 08:54:57AM -0500, Brian Foster wrote:
> > > > > On Fri, Feb 16, 2018 at 09:57:01AM -0800, Darrick J. Wong wrote:
> > > > > > On Thu, Feb 15, 2018 at 08:16:30AM -0500, Brian Foster wrote:
> > > > > > > On Wed, Feb 14, 2018 at 06:05:48PM -0800, Darrick J. Wong wrote:
> > > > > > > > On Wed, Feb 14, 2018 at 02:56:51PM -0500, Brian Foster wrote:
> > > ...
> > > > > > > 
> > > > > > > Ok, so the intent is to "protect" those old kernels from a filesystem
> > > > > > > that has seen a "fixed" kernel so the _old_ kernel doesn't explode on a
> > > > > > > filesystem that has a wrapped agfl from a fixed kernel. That sounds
> > > > > > > reasonable.. I guess what concerns me is that the bit doesn't seem to be
> > > > > > > used in that manner. A user can always run a fixed kernel, wrap the
> > > > > > > agfl, then go back to the bad one and cry when it explodes.
> > > > > > > 
> > > > > > > IOW, if the goal is this kind of retroactive protection, I'd expect the
> > > > > > > bit to be enabled any time the filesystem is in a state that would cause
> > > > > > > problems on the old kernel. Technically, that could mean doing something
> > > > > > > like setting the bit any time an AGFL is in a wrapped state and clearing
> > > > > > > it when that state clears. Or perhaps setting it unconditionally on
> > > > > > > mount and leaving it permanently would be another way to satisfy the
> > > > > > > requirement, though certainly more heavy handed.
> > > > > > 
> > > > > > The previous version of this series did that, though Dave suggested I
> > > > > > change it to set the bit only if we actually touched an agfl.  I
> > > > > > probably should have pushed back harder, but it was only rfc at the
> > > > > > time.
> > > > > > 
> > > > > 
> > > > > It's not clear which of the above options you're referring to. FWIW, the
> > > > > former seems notably more usable to me than the latter.
> > > > 
> > > > Sorry about that.  The previous iteration would do:
> > > > 
> > > > if (!has_fixedagfl) {
> > > > 	fix_agfls();
> > > > 	set_fixedagfl();
> > > > }
> > > > 
> > > > Which probably doesn't satisfy your usability test. :)
> > > > 
> > > > Hmm... another possible strategy would be to fix the agfls and set the
> > > > fixedagfl bit at mount time, then at unmount/freeze/remount-ro we fix
> > > > the agfls again and clear the fixedagfl bit.  Then the only way an
> > > > unpatched kernel hits this is if we crash with a patched kernel and the
> > > > recovery mount is an unpatched kernel.
> > > > 
> > > 
> > > What's the purpose of "fixing the agfls again" at unmount time? Hasn't
> > > the patched kernel already addressed the size mismatch at mount time?
> > 
> > Sorry, my wording there didn't match what was in my head.  I'll try
> > again:
> > 
> > At mount/remount-rw time:
> > if agfl list is wrapped assuming the shorter agfl length:
> > 	fix wrapping to fit the longer 4.5+ agfl length
> > set fixedagfl bit
> > 
> > At unmount/freeze/remount-ro time:
> > if agfl list touches the last agfl item:
> > 	reset agfl list to the start of the agfl
> > clear fixedagfl bit
> > 
> 
> Hmm, that doesn't seem so bad. I was hesitant to change agfl/allocator
> behavior going forward for such a rare case, but doing one-off fixups at
> mount/unmount time is much less intrusive. The reduced scope of rocompat
> enforcement also eliminates permanent breakage for unsuspecting users.
> 
> Of course it still suffers from the general drawbacks associated with
> using a feature bit. IMO, it's not worth creating a new set of backwards
> incompat kernels (i.e., those with the feature bit vs. those without,
> where both kernels are essentially "fixed" already with regard to the
> agfl) just to provide nicer behavior for users still on the broken
> kernel. E.g., if they're still on the broken kernel, I don't see them
> upgrading to the feature enabled non-broken kernel designed to fix the
> broken kernel. Further, if the rhel case ends up using a similar
> "backwards detection" (if that is even possible, I have no idea)
> approach to address functionality, I suspect we'd have to support the
> feature bit on a kernel that doesn't technically have the fixed agfl.
> 
> The detection patch fixes breakage going forward. AFAICT all the feature
> bit does is create a softer landing for users going back to a broken
> kernel. In the end, this doesn't actually fix the broken environment, so
> the next steps are the same either way: the user needs to upgrade the
> broken kernel. This all sums up to what seems like a superfluous change
> to me. All that said, if others feel like we really need this for some
> reason then I'd prefer this policy over anything we've discussed so far
> wrt the feature bit.

It occurred to me over breakfast that we also have rocompat features
that were introduced after 4.5, which narrows further the number of
filesystems requiring fixups.

rmapbt and reflink were introduced in 4.8 and 4.9, which means that any
fs with either of those features enabled has always the longer agfl size
and cannot be write-mounted on an unpatched kernel.  As time goes by,
the percentage of v5 filesystems without either feature will shrink.
Therefore, we can skip both agfl fixups if either feature is enabled:

At mount/remount-rw time:
if !reflink && !rmap && agfl list is wrapped assuming the shorter agfl length:
	fix wrapping to fit the longer 4.5+ agfl length

At unmount/freeze/remount-ro time:
if !reflink && !rmap && agfl list touches the last agfl item:
	reset agfl list to the start of the agfl

The only way we're vulnerable to agfl wrapping problems is if someone
(a) uses an unpatched kernel to (b) recover a dirty filesystem that (c)
has a wrapped agfl and (d) doesn't have rmap or reflink enabled.

I think that's narrow enough that we can skip the rocompat bit.

--D

> Brian
> 
> > This way, we're only using the rocompat bit to prevent unpatched kernels
> > from *recovering* filesystems that might have an agfl wrap that it
> > doesn't understand.
> > 
> > > > What does everyone think of this strategy?  I think I like it the most
> > > > of all the things we've put forth because the only time anyone will trip
> > > > over this rocompat bit is this one specific scenario.
> > > > 
> > > > (Apologies if we've already talked about this, I've gotten lost in this
> > > > thread.)
> > > > 
> > > ...
> > > > > > > 
> > > > > > > Hmm, I still think my preferred approach is to mount time detect,
> > > > > > > enforce read-only and tell the user to xfs_repair[1]. It's the most
> > > > > > 
> > > > > > I think this is difficult to do for the root fs -- in general I don't
> > > > > > think initrds ship with xfs_repair and the logic to detect the "refuses
> > > > > > to mount rw" condition and react to that with xfs_repair and a second
> > > > > > mount attempt.
> > > > > > 
> > > > > 
> > > > > Good point. I guess what concerns me is leaving around such highly
> > > > > specialized code. I figured an error check is less risk than a function
> > > > > that modifies the fs. Do we have a test case that would exercise this
> > > > > codepath going forward, at least?
> > > > 
> > > > I did write xfstests for the general case, though apparently I've been
> > > > lagging xfstests and haven't sent it.  :/
> > > > 
> > > > Oh, right, we're still discussing semantics & existence of an rocompat
> > > > bit, which affects the golden output.
> > > > 
> > > 
> > > Ok..
> > > 
> > > > > > Also I think this doesn't work if we do an initial ro mount and then try
> > > > > > to remount-rw...
> > > > > > 
> > > > > 
> > > > > That one seems like it would just require a bit more code. E.g., we'd
> > > > > have to cache or re-check state on ro -> rw transitions. That's
> > > > > secondary to the rootfs use case justification, though..
> > > > 
> > > > <nod>
> > > > 
> > > > > > > simple implementation as we don't have to carry fixup code in the kernel
> > > > > > > for such an uncommon situation (which facilitates a broad backport
> > > > > > > strategy). It protects the users on stable kernels who pull in the
> > > > > > > latest bug fixes, including the guy who goes from a bad kernel to a good
> > > > > > > one, and all kernels going forward without having to use a feature bit.
> > > > > > > We don't do anything for the guy who wraps the agfl on newer kernels and
> > > > > > > goes back to the bad one, but at some point he just needs to update the
> > > > > > > broken kernel.
> > > > > > > 
> > > > > > > [1] I guess doing a mount time fix up vs. a read-only mount + warning is
> > > > > > > a separate question from the feature bit. I'd prefer the latter for
> > > > > > > simplicity, but perhaps there are use cases where the benefit outweighs
> > > > > > > the risk.
> > > > > > > 
> > > > > > > Anyways, that's just my .02. There is no real great option here, I
> > > > > > > guess. I just think that at some point maybe it's best to fix all the
> > > > > > > kernels we can to handle the size mismatch, live with the fact that some
> > > > > > > users might have those old, unfixed kernels and bonk them on the head to
> > > > > > > upgrade when they (hopefully decreasingly) pop up. Perhaps others can
> > > > > > > chime in with more thoughts..
> > > > > > 
> > > > > > Yeah, we could do that too (make all the next releases of
> > > > > > rhel/ol/debian/ubuntu/sles/whatever quietly fix the bad-wrap agfls and
> > > > > > skip the feature bit) and just let the unpatched kernels rot.  It keeps
> > > > > > coming up, though...
> > > > > > 
> > > > > 
> > > > > I guess I haven't really noticed it enough to consider it an ongoing
> > > > > issue (which doesn't mean it isn't :P). Any pattern to the use cache
> > > > > behind these continued reports..?
> > > > 
> > > > No particular pattern, other than using/freeing agfl blocks.
> > > > 
> > > 
> > > I'm more curious about the use case than the workload. E.g., what's the
> > > purpose for using two kernels? Was it bad luck on an upgrade or
> > > something that implies the problematic state could reoccur (i.e., active
> > > switching between good/bad kernels)?
> > > 
> > > IOW, if the "it keeps coming up" situation is simply because the
> > > existing userbase hasn't fully upgraded yet, then a simple detect/repair
> > > forward-fix patch is going to be sufficient (perhaps with a warning not
> > > to revert to $previous kernel) to resolve the problem for the remaining
> > > set of users who are susceptible to the problem.
> > > 
> > > If the problem is instead a set of users that are switching back and
> > > forth for whatever reason, then perhaps a feature bit policy is more
> > > justified.
> > 
> > $employer has customers that want or have to run mixed environments.
> > 
> > Not many, though.
> > 
> > > > > From an upstream perspective, RHEL continuing to operate in this mode
> > > > > certainly does create an ongoing situation where a "current" distro
> > > > > has/causes compatibility issues, particularly since some other
> > > > > downstream distros may have kernels that use the upstream/fixed format.
> > > > > 
> > > > > I'll reiterate that in principle I don't think downstream decisions
> > > > > should prohibit doing the right thing upstream, but for somebody on a
> > > > > downstream distro who happens to test an upstream kernel (say as part of
> > > > > a regression test/investigation), having the upstream kernel decide the
> > > > > downstream kernel is no longer allowed to mount the fs seems kind of
> > > > > mean. :P
> > > > 
> > > > Agreed, though agfl-related crashes are also mean. :/
> > > > 
> > > 
> > > Yeah, but you can recover from that. In the use case above, the feature
> > > bit has permanently prevented the user from going back to the distro
> > > kernel. So this is one semi-realistic "switching between kernels" use
> > > case where I think this feature bit actually hurts more than it helps.
> > > If it were a big enough problem, I'd much rather have the downstream
> > > kernel implement the feature bit to indicate that the upstream kernel
> > > shouldn't/can't mount this filesystem than retroactively doing the
> > > opposite. Even then, I still think I'd prefer the downstream kernel just
> > > detect, fail to mount and encourage repair so we don't have to support
> > > the strange transition from a clearly unsupported sequence (but this is
> > > all downstream/distro discussion).
> > > 
> > > FWIW, if you actually want to make progress on this in absense of any
> > > other feedback, I think the best approach is to refactor this series to
> > > separate the detect/repair mechanism from all of this feature bit policy
> > > stuff. The former all seems reasonable/justified to me based on your
> > > explanations, so I'd be happy to review those portions of it without the
> > > feature bit (which could still be tacked on for further review,
> > > reordered appropriately on merge if ultimately necessary, etc.) and
> > > consider the feature bit separately based on justification for added
> > > benefit over the former.
> > 
> > Ok, I'll split out the part that fixes the AGFL and we can review those
> > parts separately from the feature bit stuff.
> > 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > > --D
> > > > 
> > > > > Brian
> > > > > 
> > > > > > --D
> > > > > > 
> > > > > > > 
> > > > > > > Brian
> > > > > > > 
> > > > > > > > > caused this in the first place ..." Then we write a FAQ entry that
> > > > > > > > > elaborates on how/why a user hit that problem and backport to stable
> > > > > > > > > kernels as far and wide as possible. Hm?
> > > > > > > > 
> > > > > > > > I /do/ like the part about writing FAQ to update the kernel.
> > > > > > > > 
> > > > > > > > --D
> > > > > > > > 
> > > > > > > > > Brian
> > > > > > > > > 
> > > > > > > > > > --D
> > > > > > > > > > --
> > > > > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > > > > --
> > > > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > > > --
> > > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > > --
> > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html