Re: [PATCH 5/5] xfs: fix agfl wrapping

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 1 Mar 2018 12:28:33 -0500

On Wed, Feb 28, 2018 at 03:20:32PM -0800, Darrick J. Wong wrote:
> On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
...
> > > > > diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> > > > > new file mode 100644
> > > > > index 0000000..0cad7bb
> > > > > --- /dev/null
> > > > > +++ b/fs/xfs/xfs_fixups.c
> > > > > @@ -0,0 +1,310 @@
...
> > > Now let's say that we run for a while and want to unmount, but our AGFL
> > > wraps like this:
> > > 
> > >          0   1   2   3   4   5   6   7   8   9
> > > Header | T | U | V |   |   |   |   | Q | R | S |
> > >                  ^-- fllast          ^-- flfirst
> > > 
> > > We don't know that the next kernel to mount this filesystem will have
> > > the "4.5 agfl padding fix" applied to it; if it does not, it will flag
> > > the AGF as corrupted because flcount is 6 but in its view the distance
> > > between flfirst and fllast (which omits bno[9]) is 5.  We don't want
> > > it to choke on that, so the unmount fixer moves all the records between
> > > flfirst and the end (Q, R, and S) towards the start of the block and
> > > resets flfirst/fllast:
> > > 
> > >          0   1   2   3   4   5   6   7   8   9
> > > Header | T | U | V | Q | R | S |   |   |   |   |
> > >          ^-- flfirst         ^-- fllast
> > > 
> > 
> > Could we use this basic algorithm at mount time as well? I know it
> > wouldn't be _exactly_ the same operation at mount time as it is for
> > unmount since slot 9 is a gap in the former case, but afaict the only
> > difference from an algorithmic perspective is the length of the shift.
> > 
> > IOW, if we were to parameterize the length of the shift and have the
> > mount fixup call the unmount algorithm, would it not also address the
> > problem?
> 
> Yes, I believe that would work.  I think it would be more efficient in
> the patch below to memmove the entries instead of put_freelist'ing them
> individually.
> 
> If the agfl is completely full then fix_freelist ought to trim it down
> by at least one element.  The algorithm then becomes:
> 
> if mounting and flfirst < fllast:
> 	return 0
> if flcount == agfl_size:
> 	assert !mounting
> 	fix_freelist()

Not sure I follow where the fix_freelist() came in, but I think we need
to be careful here. What if flfirst == 118 and that slot is garbage?
Don't we need to fix up the agfl before we can allow any traditional
operations to proceed?

> 	assert flcount < agfl_size
> if flfirst < fllast:
> 	return 0
> movelen = agfl_size - flfirst
> if active == flcount - 1:
> 	movelen--
> memmove(&agflbno[fllast + 1], &agflbno[flfirst], movelen)
> flfirst = 0
> fllast = fllast + movelen
> 
...
> > > > 
> > > > So we're not attempting to cover the case where the agfl has 1 more
> > > > block than the agfl size (i.e., the case where an fs goes back to a
> > > > kernel with an unpacked header)?
> > > 
> > > We don't know how the next kernel to touch this filesystem will define
> > > XFS_AGFL_SIZE -- it could be a 4.5+ kernel (same agfl size), a 32-bit
> > > pre-4.5 kernel (same agfl size), or a 64-bit pre-4.5 kernel (small agfl
> > > size).
> > > 
> > 
> > I don't think I was clear.. I'm envisioning whether we could come up
> > with a patch that would generically fix up the agfl on disk to be sane
> > relative to the current kernel. This patch covers the case of a packed
> > agfl kernel mounting an unpacked on-disk agfl. It would be nice if we
> > could implement something that also handled a packed on-disk agfl to an
> > unpacked agfl kernel (for easier backport to unpacked kernels, for
> > e.g.).
> 
> If we're going to touch an old kernel's source at all I'd rather we
> backport both the packing fix and this fixer-upper.
> 

Not sure I parse... the "old kernel" is essentially the rhel example
where we apparently have deliberately maintained the unpacked format to
avoid this incompatibility problem. If we had a patch that generically
converted on-disk format (packed or unpacked) to the current kernel
(packed or unpacked), we could merge that patch to upstream, stable as
well as distro kernels that might not include the agfl packing fix and
eliminate compatibility problems between them (even if the packing fix
comes in out of order).

Otherwise, we need a separate unpacked -> packed fixup for packed
kernels (i.e., this patch) and a packed -> unpacked fixup for unpacked
kernels and to make sure they are used in the right places. Trying to
see if we could avoid this kind of dependency matrix was one of the
objectives around the hack I posted previously. I'm not married to that
particular implementation, but I'm much less concerned about
inefficiency (even if it dictates a mount time fixup over a runtime one)
in comparison to simplicity and flexibility. Perhaps we can accomplish
something similarly flexible via direct buffer manipulation..?

FWIW, I've appended another variant of the previous hack that is less
brute force, but I think is still able to convert back and forth. The
tradeoff is essentially that it no longer uses the same generic
algorithm.

Brian

--- 8< ---

static int
xfs_agfl_ondisk_size(
	struct xfs_mount	*mp,
	int			first,
	int			last,
	int			flcount)
{
	int			active;
	int			size;
	int			agfl_size = XFS_AGFL_SIZE(mp);

	if (last >= first)
		active = last - first + 1;
	else
		active = agfl_size - first + last + 1;

	if (active == flcount + 1)
		size = agfl_size - 1;
	else if ((active == flcount - 1) ||
		 first == agfl_size || last == agfl_size)
		size = agfl_size + 1;
	else if (active == flcount)
		size = agfl_size;
	else
		return -EFSCORRUPTED;

	return size;
}

int
xfs_agfl_fixup(
	struct xfs_trans	*tp,
	struct xfs_buf		*agbp,
	struct xfs_perag	*pag)
{
	struct xfs_mount	*mp = tp->t_mountp;
	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
	int			agfl_size = XFS_AGFL_SIZE(mp);
	int			ofirst, olast, osize;
	int			nfirst, nlast;
	struct xfs_buf		*agflbp;
	__be32			*agfl_bno;
	xfs_agblock_t		bno = -1;
	int			tidx = -1;
	bool			empty = false;
	int			logflags = 0;
	int			error;

	ofirst = nfirst = be32_to_cpu(agf->agf_flfirst);
	olast = nlast = be32_to_cpu(agf->agf_fllast);
	osize = xfs_agfl_ondisk_size(mp, ofirst, olast, pag->pagf_flcount);
	if (osize < 0)
		return osize;
	if (pag->pagf_flcount == 0)
		empty = true;

	/* sizes match, nothing to do */
	if (osize == agfl_size)
		return 0;

	/* size mismatch, read the agfl.. */
	error = xfs_alloc_read_agfl(mp, tp, be32_to_cpu(agf->agf_seqno),
				    &agflbp);
	if (error)
		return error;
	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);

	/*
	 * If the on-disk agfl is smaller than what the kernel expects, the last
	 * slot of the on-disk agfl is a gap with bogus data. Allocate the first
	 * valid block from the agfl, manually place it in the gap and fix up
	 * the count.
	 */
	if (osize < agfl_size) {
		ASSERT(!empty);
		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
		if (error)
			goto out_relse;

		pag->pagf_flcount++;
		be32_add_cpu(&agf->agf_flcount, 1);
		logflags |= XFS_AGF_FLCOUNT;
		tidx = agfl_size - 1;
		goto done;
	}

	/*
	 * Otherwise, the on-disk agfl is larger than what the current kernel
	 * can manage. If the agfl was empty, we just fix up the first and last
	 * pointers. If not, move the inaccessible block in the last slot to the
	 * next valid, open slot.
	 */
	nfirst = do_mod(nfirst, agfl_size);
	if (empty) {
		nlast = (nfirst == 0 ? agfl_size - 1 : nfirst - 1);
		goto done;
	}
	if (nlast != agfl_size)
		nlast++;
	nlast = do_mod(nlast, agfl_size);
	tidx = nlast;
	bno = be32_to_cpu(agfl_bno[osize - 1]);

done:
	if (nfirst != ofirst) {
		agf->agf_flfirst = cpu_to_be32(nfirst);
		logflags |= XFS_AGF_FLFIRST;
	}
	if (nlast != olast) {
		agf->agf_fllast = cpu_to_be32(nlast);
		logflags |= XFS_AGF_FLLAST;
	}
	if (bno != -1) {
		int	startoff;

		agfl_bno[tidx] = cpu_to_be32(bno);
		xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
		startoff = (char *) &agfl_bno[tidx] - (char *) agflbp->b_addr;
		xfs_trans_log_buf(tp, agflbp, startoff,
				  startoff + sizeof(xfs_agblock_t) - 1);
	}
	if (logflags)
		xfs_alloc_log_agf(tp, agbp, logflags);

out_relse:
	xfs_trans_brelse(tp, agflbp);
	return error;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html