Re: [PATCH 3/6] xfs: detect AGFL size mismatches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 13, 2016 at 06:39:23PM -0500, Eric Sandeen wrote:
> On 9/1/16 9:27 PM, Dave Chinner wrote:
> > +	 * still considered a corruption.
> > +	 */
> > +	if (flfirst > agfl_size)
> > +		return false;
> > +	if (flfirst == agfl_size)
> > +		xfs_warn(mp, "AGF %u: first freelist index needs correction",
> > +			be32_to_cpu(agf->agf_seqno));
> > +
> > +	if (fllast > agfl_size)
> > +		return false;
> > +	if (fllast == agfl_size)
> > +		xfs_warn(mp, "AGF %u: last freelist index needs correction",
> > +			be32_to_cpu(agf->agf_seqno));
> > +
> > +	if (flcount > agfl_size + 1)
> > +		return false;
> > +	if (flcount == agfl_size)
> > +		xfs_warn(mp, "AGF %u: freelist count needs correction(1)",
> > +			be32_to_cpu(agf->agf_seqno));
> 
> <thinking cap>
> 
> Hang on, doesn't this miss the flcount == agfl_size + 1 case in between?
> 
> agfl size is now one shorter than expected, but it's still our working
> maximum size for validity.  So flcount == agfl_size should be *ok* right?
> flcount == agfl_size + 1 needs fixing.  flcount > agfl_size is baroque.
> 
> So I'd have expected:
> 
> +	if (flcount > agfl_size + 1)
> +		return false;
> +	if (flcount == agfl_size + 1)
> +		xfs_warn(mp, "AGF %u: freelist count needs correction(1)",
> +			be32_to_cpu(agf->agf_seqno));
> 
> > +
> > +	/*
> > +	 * range check flcount - if it's out by more than 1 count or is too
> 
> ok maybe "out by" is just what you say down there.  ;)
> 
> > +	 * small, we've got a corruption that isn't a result of the structure
> > +	 * size screwup so that throws a real corruption error.
> > +	 */
> > +	active = fllast - flfirst + 1;
> 
> <thinking cap>
> Ok, flfirst or fllast *could* be occupying the "extra" slot, as warned
> above.  So active could be "1 bigger" than agfl_size
> 
> > +	if (active <= 0)
> > +		active += agfl_size;
> 
> and here we handle the wrap w/ that smaller agfl_size.  Hm...
> 
> Pretend agfl_size is 10 (slots 0->9).  Actual size is possibly 11 (0->10).
> We could have flfirst at 10, fllast at 0.
> fllast - flfirst + 1 then
> is -9.  We add back in agfl_size of 10, and get active == 1.

The free list indexes, as I've said before, are /very badly named/.
The actual situation when flfirst > fllast is this (taken from the
next patch):

        /*
         * Pure wrap, first and last are valid.
         *                        flfirst
         *                           |
         *      +oo------------------oo+
         *        |
         *      fllast
         *
         * We need to determine if the count includes the last slot in the AGFL
         * which we no longer use. If the flcount does not match the expected
         * size based on the first/last indexes, we need to fix it up.
         */

i.e. flfirst is the /tail/ of the list where we allocate blocks from,
fllast is the /head/ of the list where we put newly freed blocks.

hence if we have agfl_size = 10, flfirst = 9, flast = 0, we have 2
blocks on the freelist. i.e. flcount = 2.

good wrap case w/ corrected agfl_size:

	flfirst = 9
	fllast = 0
	flcount = 10
	agfl_size = 10
	active = fllast - flfirst + 1
	       = 0 - 9 + 1
	       = -8 + agflsize
	       = 2 (correct!)

	if (flfirst > agfl_size)		not true
	if (flfirst == agfl_size)		not true

	if (fllast > agfl_size)			not true
	if (fllast == agfl_size)		not true

	if (flcount > agfl_size + 1)		not true
	if (flcount == agfl_size)		not true

	if (active <= 0)			not true
	if (flcount > active + 1)		not true
	if (flcount < active)			not true
	if (flcount != active)			not true

No errors, no warnings, all good!

Bad wrap case:

	flfirst = 10	(invalid!)
	fllast = 0
	flcount = 2
	agfl_size = 10
	active = 0 - 10 + 1
	       = -9 + agfl_size
	       = 1 (correct as flfirst is invalid!)

	if (flfirst > agfl_size)		not true
	if (flfirst == agfl_size)		true, warn

	if (fllast > agfl_size)			not true
	if (fllast == agfl_size)		not true

	if (flcount > agfl_size + 1)		not true
	if (flcount == agfl_size)		not true

	if (active <= 0)			not true
	if (flcount > active + 1)		not true
	if (flcount < active)			not true
	if (flcount != active)			true, warn

So, two correction needed warnings, which is correct because both
flfirst and flcount need fixing due to being off by one.

The bad fllast case:

	flfirst = 5
	fllast = 10	(invalid!)
	flcount = 6	
	agfl_size = 10
	active = fllast - flfirst + 1
	       = 10 - 5 + 1
	       = 6 (invalid as fllast is out of range)

	if (flfirst > agfl_size)		not true
	if (flfirst == agfl_size)		not true

	if (fllast > agfl_size)			not true
	if (fllast == agfl_size)		true, warn

	if (flcount > agfl_size + 1)		not true
	if (flcount == agfl_size)		not true

	if (active <= 0)			not true
	if (flcount > active + 1)		not true
	if (flcount < active)			not true
	if (flcount != active)			not true

So we've got a warning that a fllast correction will take place,
which will also correct the flcount once the fllast index is
updated.

But I think the case you are concerned about is a corner case of
this one - a full AGFL, right?  Something that, in practice, will
never happen as no single transaction will free or require a full
AGFL worth of blocks in a single allocation/free operation.

As such, I didn't actually consider it or test that case, so, it may
be wrong.  Let's run the numbers:

	flfirst = 0
	fllast = 10	(invalid!)
	flcount = 11	(invalid!)
	agfl_size = 10
	active = fllast - flfirst + 1
	       = 10 - 0 + 1
	       = 11 (invalid as fllast is out of range)

	if (flfirst > agfl_size)		not true
	if (flfirst == agfl_size)		not true

	if (fllast > agfl_size)			not true
	if (fllast == agfl_size)		true, warn

	if (flcount > agfl_size + 1)		not true
	if (flcount == agfl_size)		not true

	if (active <= 0)			not true
	if (flcount > active + 1)		not true
	if (flcount < active)			not true
	if (flcount != active)			not true

So it issues the same fllast warning as the not full, not wrapped
case, and that correction also fixes up the flcount. Maybe it's the
full wrapped case, which may be a pure flcount error?

	flfirst = 5
	fllast = 4
	flcount = 11	(invalid!)
	agfl_size = 10
	active = fllast - flfirst + 1
	       = 4 - 5 + 1
	       = 10 (different to flcount!)

	if (flfirst > agfl_size)		not true
	if (flfirst == agfl_size)		not true

	if (fllast > agfl_size)			not true
	if (fllast == agfl_size)		not true

	if (flcount > agfl_size + 1)		not true
	if (flcount == agfl_size)		not true
					(yes, that should warn)

	if (active <= 0)			not true
	if (flcount > active + 1)		not true
	if (flcount < active)			not true
	if (flcount != active)			true, warn

So we do get a warning about the flcount being wrong because it's
only one block different to the calculated active size (and hence it
will be corrected), but no warning from the flcount being out of
range by one block. So, yes, I think you are right, Eric, that the
check should be against agfl_size + 1, even it it means we get
multiple warnings...

Thanks for thinking about this case!

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux