Re: [PATCH RFC] xfs: convert between packed and unpacked agfls on-demand

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 10 Mar 2018 09:10:45 +1100

On Fri, Mar 09, 2018 at 01:37:28PM -0500, Brian Foster wrote:
> On Fri, Mar 09, 2018 at 09:33:18AM -0800, Darrick J. Wong wrote:
> > On Fri, Mar 09, 2018 at 08:16:28AM -0500, Brian Foster wrote:
> > > On Thu, Mar 08, 2018 at 03:03:54PM +0100, Carlos Maiolino wrote:
> > > > Hi,
> > > > 
> > > > On Wed, Mar 07, 2018 at 02:24:51PM -0500, Brian Foster wrote:
> > > > > Disliked-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > > > ---
> > > > > 
> > > > > Sent as RFC for the time being. This tests Ok on a straight xfstests run
> > > > > and also seems to pass Darrick's agfl fixup tester (thanks) both on
> > > > > upstream and on a rhel7 kernel with some minor supporting hacks.
> > > > > 
> > > > > I tried to tighten up the logic a bit to reduce the odds of mistaking
> > > > > actual corruption for a padding mismatch as much as possible. E.g.,
> > > > > limit to cases where the agfl is wrapped, make sure we don't mistake a
> > > > > corruption that looks like an agfl with 120 entries on a packed kernel,
> > > > > etc.
> > > > > 
> > > > > While I do prefer an on-demand fixup approach to a mount time scan, ISTM
> > > > > that in either case it's impossible to completely eliminate the risk of
> > > > > confusing corruption with a padding mismatch so long as we're doing a
> > > > > manual agfl fixup. The more I think about that the more I really dislike
> > > > > doing this. :(
> > > > > 
> > > > > After some IRC discussion with djwong and sandeen, I'm wondering if the
> > > > > existence of 'xfs_repair -d' is a good enough last resort for those
> > > > > users who might be bit by unexpected padding issues on a typical
> > > > > upgrade. If so, we could fall back to a safer mount-time detection model
> > > > > that enforces a read-only mount and let the user run repair. The
> > > > > supposition is that those who aren't prepared to repair via a ramdisk or
> > > > > whatever should be able to 'xfs_repair -d' a rootfs that is mounted
> > > > > read-only provided agfl padding is the only inconsistency. 
> > > > > 
> > > > > Eric points out that we can still write an unmount record for a
> > > > > read-only mount, but I'm not sure that would be a problem if repair only
> > > > > needs to fix the agfl. xfs_repair shouldn't touch the log unless there's
> > > > > a recovery issue or it needs to be reformatted to update the LSN, both
> > > > > of which seem to qualify as "you have more problems than agfl padding
> > > > > and need to run repair anyways" to me. Thoughts?
> > > > > 
> > > > 
> > > > Sorry if this may sound stupid, but in the possibility this can help the issue,
> > > > or at least me learning something new.
> > > > 
> > > > ISTM this issue is all related to the way xfs_agfl packing. I read the commit
> > > > log where packed attribute was added to xfs_agfl, and I was wondering...
> > > > 
> > > > What are the implications of breaking up the lsn field in xfs_agfl, in 2 __be32?
> > > > Merge it together in a 64bit field when reading it from disk, or split it when
> > > > writing to?
> > > > It seems to me this would avoid the size difference we are seeing now in 32/64
> > > > bit systems, and avoid such risk of confusion when trying to discern between a
> > > > corrupted agfl and a padding mismatch.
> > > > 
> > > 
> > > I'm not following how you'd accomplish the latter..? We already have the
> > > packed attribute in place, so the padding is fixed with that. This
> > > effort has to do with trying to fix up an agfl written by an older
> > > kernel without the padding fix. My understanding is that the xfs_agfl
> > > header looks exactly the same on-disk in either case, the issue is a
> > > broken size calculation that causes the older kernel to not see/use one
> > > last slot in the agfl. If the agfl has wrapped and a newer kernel loads
> > > the same on-disk structure, it has no means to know whether the content
> > > of the final slot is a valid block or a "gap" left by an older kernel
> > > other than to check whether flcount matches the active count from
> > > flfirst -> fllast (and that's where potential confusion over a padding
> > > issue vs other corruption comes into play).
> > 
> > Your understanding is correct.
> > 
> > Sez me, who is watching the fsck.xfs -f discussion just in case that can
> > be turned into a viable option quickly.
> > 
> 
> Thanks.
> 
> > ..and wondering what if we /did/ just implement Dave's suggestion from
> > years ago where if the flcount doesn't match we just reset the agfl and
> > call fix_freelist to refill it with fresh blocks... it would suck to
> > leak blocks, though.  Obviously, if the fs has rmap enabled then we can
> > just rebuild it on the spot via xfs_repair_agfl() (see patch on mailing
> > list).
> > 
> 
> I wasn't initially too fond of this idea, but thinking about it a bit
> more, there is definitely some value in terms of determinism. We know
> we'll just leak some blocks vs. riskily swizzling around a corrupted
> agfl.

And, most importantly: it's trivial to backport to other kernels.
The we simply don't have to worry where the filesystem has been
mounted - if we detect a suspect situation for the running kernel,
we just let go of the free list and rebuild it.

> Given that, perhaps an on-demand reset with a "Corrupted AGFL, tossed
> $flcount blocks, unmount and run xfs_repair if you ever want to see them
> again" warning might not be so bad. It works the same whether the kernel
> is packed, unpacked or the agfl is just borked, and therefore the test
> matrix is much simpler as well. Hmmmm...

Yup, exactly my thoughts. The more I read about the hoops we're
considering jumping through to work around this problem, the more I
like this solution....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html