Re: [PATCH v2 0/4] xfs: fix v5 AGFL wrapping

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 22 Feb 2018 13:28:25 -0500

On Thu, Feb 22, 2018 at 10:11:30AM -0800, Darrick J. Wong wrote:
> On Thu, Feb 22, 2018 at 10:06:28AM -0800, Darrick J. Wong wrote:
> > On Thu, Feb 22, 2018 at 06:55:03AM -0500, Brian Foster wrote:
> > > On Wed, Feb 21, 2018 at 09:35:46AM -0800, Darrick J. Wong wrote:
> > > > On Wed, Feb 21, 2018 at 08:39:16AM -0500, Brian Foster wrote:
> > > > > On Tue, Feb 20, 2018 at 09:08:30AM -0800, Darrick J. Wong wrote:
> > > > > > On Mon, Feb 19, 2018 at 08:54:57AM -0500, Brian Foster wrote:
> > > > > > > On Fri, Feb 16, 2018 at 09:57:01AM -0800, Darrick J. Wong wrote:
> > > > > > > > On Thu, Feb 15, 2018 at 08:16:30AM -0500, Brian Foster wrote:
> > > > > > > > > On Wed, Feb 14, 2018 at 06:05:48PM -0800, Darrick J. Wong wrote:
> > > > > > > > > > On Wed, Feb 14, 2018 at 02:56:51PM -0500, Brian Foster wrote:
> > > > > ...
> > > > > > > > > 
> > > > > > > > > Ok, so the intent is to "protect" those old kernels from a filesystem
> > > > > > > > > that has seen a "fixed" kernel so the _old_ kernel doesn't explode on a
> > > > > > > > > filesystem that has a wrapped agfl from a fixed kernel. That sounds
> > > > > > > > > reasonable.. I guess what concerns me is that the bit doesn't seem to be
> > > > > > > > > used in that manner. A user can always run a fixed kernel, wrap the
> > > > > > > > > agfl, then go back to the bad one and cry when it explodes.
> > > > > > > > > 
> > > > > > > > > IOW, if the goal is this kind of retroactive protection, I'd expect the
> > > > > > > > > bit to be enabled any time the filesystem is in a state that would cause
> > > > > > > > > problems on the old kernel. Technically, that could mean doing something
> > > > > > > > > like setting the bit any time an AGFL is in a wrapped state and clearing
> > > > > > > > > it when that state clears. Or perhaps setting it unconditionally on
> > > > > > > > > mount and leaving it permanently would be another way to satisfy the
> > > > > > > > > requirement, though certainly more heavy handed.
> > > > > > > > 
> > > > > > > > The previous version of this series did that, though Dave suggested I
> > > > > > > > change it to set the bit only if we actually touched an agfl.  I
> > > > > > > > probably should have pushed back harder, but it was only rfc at the
> > > > > > > > time.
> > > > > > > > 
> > > > > > > 
> > > > > > > It's not clear which of the above options you're referring to. FWIW, the
> > > > > > > former seems notably more usable to me than the latter.
> > > > > > 
> > > > > > Sorry about that.  The previous iteration would do:
> > > > > > 
> > > > > > if (!has_fixedagfl) {
> > > > > > 	fix_agfls();
> > > > > > 	set_fixedagfl();
> > > > > > }
> > > > > > 
> > > > > > Which probably doesn't satisfy your usability test. :)
> > > > > > 
> > > > > > Hmm... another possible strategy would be to fix the agfls and set the
> > > > > > fixedagfl bit at mount time, then at unmount/freeze/remount-ro we fix
> > > > > > the agfls again and clear the fixedagfl bit.  Then the only way an
> > > > > > unpatched kernel hits this is if we crash with a patched kernel and the
> > > > > > recovery mount is an unpatched kernel.
> > > > > > 
> > > > > 
> > > > > What's the purpose of "fixing the agfls again" at unmount time? Hasn't
> > > > > the patched kernel already addressed the size mismatch at mount time?
> > > > 
> > > > Sorry, my wording there didn't match what was in my head.  I'll try
> > > > again:
> > > > 
> > > > At mount/remount-rw time:
> > > > if agfl list is wrapped assuming the shorter agfl length:
> > > > 	fix wrapping to fit the longer 4.5+ agfl length
> > > > set fixedagfl bit
> > > > 
> > > > At unmount/freeze/remount-ro time:
> > > > if agfl list touches the last agfl item:
> > > > 	reset agfl list to the start of the agfl
> > > > clear fixedagfl bit
> > > > 
> > > 
> > > Hmm, that doesn't seem so bad. I was hesitant to change agfl/allocator
> > > behavior going forward for such a rare case, but doing one-off fixups at
> > > mount/unmount time is much less intrusive. The reduced scope of rocompat
> > > enforcement also eliminates permanent breakage for unsuspecting users.
> > > 
> > > Of course it still suffers from the general drawbacks associated with
> > > using a feature bit. IMO, it's not worth creating a new set of backwards
> > > incompat kernels (i.e., those with the feature bit vs. those without,
> > > where both kernels are essentially "fixed" already with regard to the
> > > agfl) just to provide nicer behavior for users still on the broken
> > > kernel. E.g., if they're still on the broken kernel, I don't see them
> > > upgrading to the feature enabled non-broken kernel designed to fix the
> > > broken kernel. Further, if the rhel case ends up using a similar
> > > "backwards detection" (if that is even possible, I have no idea)
> > > approach to address functionality, I suspect we'd have to support the
> > > feature bit on a kernel that doesn't technically have the fixed agfl.
> > > 
> > > The detection patch fixes breakage going forward. AFAICT all the feature
> > > bit does is create a softer landing for users going back to a broken
> > > kernel. In the end, this doesn't actually fix the broken environment, so
> > > the next steps are the same either way: the user needs to upgrade the
> > > broken kernel. This all sums up to what seems like a superfluous change
> > > to me. All that said, if others feel like we really need this for some
> > > reason then I'd prefer this policy over anything we've discussed so far
> > > wrt the feature bit.
> > 
> > It occurred to me over breakfast that we also have rocompat features
> > that were introduced after 4.5, which narrows further the number of
> > filesystems requiring fixups.
> > 
> > rmapbt and reflink were introduced in 4.8 and 4.9, which means that any
> > fs with either of those features enabled has always the longer agfl size
> > and cannot be write-mounted on an unpatched kernel.  As time goes by,
> > the percentage of v5 filesystems without either feature will shrink.
> > Therefore, we can skip both agfl fixups if either feature is enabled:
> > 
> > At mount/remount-rw time:
> > if !reflink && !rmap && agfl list is wrapped assuming the shorter agfl length:
> > 	fix wrapping to fit the longer 4.5+ agfl length
> > 
> > At unmount/freeze/remount-ro time:
> > if !reflink && !rmap && agfl list touches the last agfl item:
> > 	reset agfl list to the start of the agfl
> 
> <sigh> pseudocode bug; try again:
> 
> At mount/remount-rw time:
> if v5fs && !reflink && !rmap && agfl list is wrapped assuming the shorter agfl length:
> 	fix wrapping to fit the longer 4.5+ agfl length
> 
> At unmount/freeze/remount-ro time:
> if v5fs && !reflink && !rmap && agfl list touches the last agfl item:
> 	reset agfl list to the start of the agfl
> 
> Those first three predicates should be a helper function that does:
> 
> if v5 && (no v5 feature bits are set that weren't defined in 4.5)
> 

That sounds reasonable enough to me so long as we have a nice comment
that explains how we're using this combination of feature bits. :) I
also like how it at least puts some kind of life expectancy on the
bandaid itself.

Brian

> --D
> 
> > 
> > The only way we're vulnerable to agfl wrapping problems is if someone
> > (a) uses an unpatched kernel to (b) recover a dirty filesystem that (c)
> > has a wrapped agfl and (d) doesn't have rmap or reflink enabled.
> > 
> > I think that's narrow enough that we can skip the rocompat bit.
> > 
> > --D
> > 
> > > Brian
> > > 
> > > > This way, we're only using the rocompat bit to prevent unpatched kernels
> > > > from *recovering* filesystems that might have an agfl wrap that it
> > > > doesn't understand.
> > > > 
> > > > > > What does everyone think of this strategy?  I think I like it the most
> > > > > > of all the things we've put forth because the only time anyone will trip
> > > > > > over this rocompat bit is this one specific scenario.
> > > > > > 
> > > > > > (Apologies if we've already talked about this, I've gotten lost in this
> > > > > > thread.)
> > > > > > 
> > > > > ...
> > > > > > > > > 
> > > > > > > > > Hmm, I still think my preferred approach is to mount time detect,
> > > > > > > > > enforce read-only and tell the user to xfs_repair[1]. It's the most
> > > > > > > > 
> > > > > > > > I think this is difficult to do for the root fs -- in general I don't
> > > > > > > > think initrds ship with xfs_repair and the logic to detect the "refuses
> > > > > > > > to mount rw" condition and react to that with xfs_repair and a second
> > > > > > > > mount attempt.
> > > > > > > > 
> > > > > > > 
> > > > > > > Good point. I guess what concerns me is leaving around such highly
> > > > > > > specialized code. I figured an error check is less risk than a function
> > > > > > > that modifies the fs. Do we have a test case that would exercise this
> > > > > > > codepath going forward, at least?
> > > > > > 
> > > > > > I did write xfstests for the general case, though apparently I've been
> > > > > > lagging xfstests and haven't sent it.  :/
> > > > > > 
> > > > > > Oh, right, we're still discussing semantics & existence of an rocompat
> > > > > > bit, which affects the golden output.
> > > > > > 
> > > > > 
> > > > > Ok..
> > > > > 
> > > > > > > > Also I think this doesn't work if we do an initial ro mount and then try
> > > > > > > > to remount-rw...
> > > > > > > > 
> > > > > > > 
> > > > > > > That one seems like it would just require a bit more code. E.g., we'd
> > > > > > > have to cache or re-check state on ro -> rw transitions. That's
> > > > > > > secondary to the rootfs use case justification, though..
> > > > > > 
> > > > > > <nod>
> > > > > > 
> > > > > > > > > simple implementation as we don't have to carry fixup code in the kernel
> > > > > > > > > for such an uncommon situation (which facilitates a broad backport
> > > > > > > > > strategy). It protects the users on stable kernels who pull in the
> > > > > > > > > latest bug fixes, including the guy who goes from a bad kernel to a good
> > > > > > > > > one, and all kernels going forward without having to use a feature bit.
> > > > > > > > > We don't do anything for the guy who wraps the agfl on newer kernels and
> > > > > > > > > goes back to the bad one, but at some point he just needs to update the
> > > > > > > > > broken kernel.
> > > > > > > > > 
> > > > > > > > > [1] I guess doing a mount time fix up vs. a read-only mount + warning is
> > > > > > > > > a separate question from the feature bit. I'd prefer the latter for
> > > > > > > > > simplicity, but perhaps there are use cases where the benefit outweighs
> > > > > > > > > the risk.
> > > > > > > > > 
> > > > > > > > > Anyways, that's just my .02. There is no real great option here, I
> > > > > > > > > guess. I just think that at some point maybe it's best to fix all the
> > > > > > > > > kernels we can to handle the size mismatch, live with the fact that some
> > > > > > > > > users might have those old, unfixed kernels and bonk them on the head to
> > > > > > > > > upgrade when they (hopefully decreasingly) pop up. Perhaps others can
> > > > > > > > > chime in with more thoughts..
> > > > > > > > 
> > > > > > > > Yeah, we could do that too (make all the next releases of
> > > > > > > > rhel/ol/debian/ubuntu/sles/whatever quietly fix the bad-wrap agfls and
> > > > > > > > skip the feature bit) and just let the unpatched kernels rot.  It keeps
> > > > > > > > coming up, though...
> > > > > > > > 
> > > > > > > 
> > > > > > > I guess I haven't really noticed it enough to consider it an ongoing
> > > > > > > issue (which doesn't mean it isn't :P). Any pattern to the use cache
> > > > > > > behind these continued reports..?
> > > > > > 
> > > > > > No particular pattern, other than using/freeing agfl blocks.
> > > > > > 
> > > > > 
> > > > > I'm more curious about the use case than the workload. E.g., what's the
> > > > > purpose for using two kernels? Was it bad luck on an upgrade or
> > > > > something that implies the problematic state could reoccur (i.e., active
> > > > > switching between good/bad kernels)?
> > > > > 
> > > > > IOW, if the "it keeps coming up" situation is simply because the
> > > > > existing userbase hasn't fully upgraded yet, then a simple detect/repair
> > > > > forward-fix patch is going to be sufficient (perhaps with a warning not
> > > > > to revert to $previous kernel) to resolve the problem for the remaining
> > > > > set of users who are susceptible to the problem.
> > > > > 
> > > > > If the problem is instead a set of users that are switching back and
> > > > > forth for whatever reason, then perhaps a feature bit policy is more
> > > > > justified.
> > > > 
> > > > $employer has customers that want or have to run mixed environments.
> > > > 
> > > > Not many, though.
> > > > 
> > > > > > > From an upstream perspective, RHEL continuing to operate in this mode
> > > > > > > certainly does create an ongoing situation where a "current" distro
> > > > > > > has/causes compatibility issues, particularly since some other
> > > > > > > downstream distros may have kernels that use the upstream/fixed format.
> > > > > > > 
> > > > > > > I'll reiterate that in principle I don't think downstream decisions
> > > > > > > should prohibit doing the right thing upstream, but for somebody on a
> > > > > > > downstream distro who happens to test an upstream kernel (say as part of
> > > > > > > a regression test/investigation), having the upstream kernel decide the
> > > > > > > downstream kernel is no longer allowed to mount the fs seems kind of
> > > > > > > mean. :P
> > > > > > 
> > > > > > Agreed, though agfl-related crashes are also mean. :/
> > > > > > 
> > > > > 
> > > > > Yeah, but you can recover from that. In the use case above, the feature
> > > > > bit has permanently prevented the user from going back to the distro
> > > > > kernel. So this is one semi-realistic "switching between kernels" use
> > > > > case where I think this feature bit actually hurts more than it helps.
> > > > > If it were a big enough problem, I'd much rather have the downstream
> > > > > kernel implement the feature bit to indicate that the upstream kernel
> > > > > shouldn't/can't mount this filesystem than retroactively doing the
> > > > > opposite. Even then, I still think I'd prefer the downstream kernel just
> > > > > detect, fail to mount and encourage repair so we don't have to support
> > > > > the strange transition from a clearly unsupported sequence (but this is
> > > > > all downstream/distro discussion).
> > > > > 
> > > > > FWIW, if you actually want to make progress on this in absense of any
> > > > > other feedback, I think the best approach is to refactor this series to
> > > > > separate the detect/repair mechanism from all of this feature bit policy
> > > > > stuff. The former all seems reasonable/justified to me based on your
> > > > > explanations, so I'd be happy to review those portions of it without the
> > > > > feature bit (which could still be tacked on for further review,
> > > > > reordered appropriately on merge if ultimately necessary, etc.) and
> > > > > consider the feature bit separately based on justification for added
> > > > > benefit over the former.
> > > > 
> > > > Ok, I'll split out the part that fixes the AGFL and we can review those
> > > > parts separately from the feature bit stuff.
> > > > 
> > > > --D
> > > > 
> > > > > 
> > > > > Brian
> > > > > 
> > > > > > --D
> > > > > > 
> > > > > > > Brian
> > > > > > > 
> > > > > > > > --D
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Brian
> > > > > > > > > 
> > > > > > > > > > > caused this in the first place ..." Then we write a FAQ entry that
> > > > > > > > > > > elaborates on how/why a user hit that problem and backport to stable
> > > > > > > > > > > kernels as far and wide as possible. Hm?
> > > > > > > > > > 
> > > > > > > > > > I /do/ like the part about writing FAQ to update the kernel.
> > > > > > > > > > 
> > > > > > > > > > --D
> > > > > > > > > > 
> > > > > > > > > > > Brian
> > > > > > > > > > > 
> > > > > > > > > > > > --D
> > > > > > > > > > > > --
> > > > > > > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > > > > > > --
> > > > > > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > > > > > --
> > > > > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > > > > --
> > > > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > > > --
> > > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > > --
> > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html