Re: [PATCH 2/8] xfs: check the ir_startino alignment directly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 08, 2019 at 10:28:30AM -0800, Darrick J. Wong wrote:
> On Tue, Jan 08, 2019 at 07:47:23AM -0500, Brian Foster wrote:
> > On Mon, Jan 07, 2019 at 05:43:07PM -0800, Darrick J. Wong wrote:
> > > On Mon, Jan 07, 2019 at 08:45:31AM -0500, Brian Foster wrote:
> > > > On Fri, Jan 04, 2019 at 12:59:06PM -0800, Darrick J. Wong wrote:
> > > > > On Fri, Jan 04, 2019 at 01:31:04PM -0500, Brian Foster wrote:
> > > > > > On Mon, Dec 31, 2018 at 06:08:39PM -0800, Darrick J. Wong wrote:
> > > > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > > > > 
> > > > > > > In xchk_iallocbt_rec, check the alignment of ir_startino by converting
> > > > > > > the inode cluster block alignment into units of inodes instead of the
> > > > > > > other way around (converting ir_startino to blocks).  This prevents us
> > > > > > > from tripping over off-by-one errors in ir_startino which are obscured
> > > > > > > by the inode -> block conversion.
> > > > > > > 
> > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > > > > ---
> > > > > > >  fs/xfs/scrub/ialloc.c |   45 +++++++++++++++++++++++++++++++++++++++------
> > > > > > >  1 file changed, 39 insertions(+), 6 deletions(-)
> > > > > > > 
> > > > > > > 
> > > > > > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > > > > > > index fd431682db0b..5082331d6c03 100644
> > > > > > > --- a/fs/xfs/scrub/ialloc.c
> > > > > > > +++ b/fs/xfs/scrub/ialloc.c
> > > > > > > @@ -265,6 +265,42 @@ xchk_iallocbt_check_freemask(
> > > > > > >  	return error;
> > > > > > >  }
> > > > > > >  
> > > > > > > +/* Make sure this inode btree record is aligned properly. */
> > > > > > > +STATIC void
> > > > > > > +xchk_iallocbt_rec_alignment(
> > > > > > > +	struct xchk_btree		*bs,
> > > > > > > +	struct xfs_inobt_rec_incore	*irec)
> > > > > > > +{
> > > > > > > +	struct xfs_mount		*mp = bs->sc->mp;
> > > > > > > +
> > > > > > > +	/*
> > > > > > > +	 * finobt records have different positioning requirements than inobt
> > > > > > > +	 * records: each finobt record must have a corresponding inobt record.
> > > > > > > +	 * That is checked in the xref function, so for now we only catch the
> > > > > > > +	 * obvious case where the record isn't even chunk-aligned.
> > > > > > > +	 *
> > > > > > > +	 * Note also that if a fs block contains more than a single chunk of
> > > > > > > +	 * inodes, we will have finobt records only for those chunks containing
> > > > > > > +	 * free inodes.
> > > > > > > +	 */
> > > > > > > +	if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
> > > > > > > +		if (irec->ir_startino & (XFS_INODES_PER_CHUNK - 1))
> > > > > > > +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > > +		return;
> > > > > > > +	}
> > > > > > 
> > > > > > Is the above really a finobt only check? Couldn't we run this
> > > > > > sanity check against all records and skip the following for finobt?
> > > > > 
> > > > > Uhoh, it occurs to me that in the 4kblock !spinodes case we can have
> > > > > inobt records (and therefore finobt records) that are aligned to
> > > > > m_cluster_align_inodes, and that value can be less than 64.  So I think
> > > > > this has to be something along the lines of:
> > > > > 
> > > > 
> > > > Er, yeah. IIRC spinodes required slightly larger inode chunk alignment
> > > > than traditionally required in order to ensure we don't create
> > > > conflicting/overlapping sparse records. A quick look at mkfs shows that
> > > > the cluster size basically defines the sparse chunk size and the chunk
> > > > alignment == chunk size.
> > > 
> > > <nod>
> > > 
> > > > > imask = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;
> > > > > if (irec->ir_startino & imask)
> > > > > 	/* set corrupt... */
> > > > > 
> > > > > Hmm, and testing seems to bear this out.  New patch forthcoming.
> > > > > 
> > > > 
> > > > Ok, I take it the min() is required because m_cluster_align_inodes can
> > > > be multiple records in the large FSB case. If so, I wonder if it would
> > > > be more simple to use m_cluster_align,
> > > 
> > > I don't see how that would work -- m_cluster_align is in units of
> > > blocks, not inodes.
> > > 
> > 
> > Convert the startino to the agbno and check that..? It's ultimately just
> > a nit, but my comment is that:
> > 
> > 	... = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;
> > 
> > ... gives me a brief wtf because I'm not sure what chunk size has to do
> > with record alignment. Then I stare at it, go look at how
> > ->m_cluster_align and friends are calculated, read the comment below and
> > work out that we're basically special casing conversion of the startino
> > of a multi-record per large FSB to block granularity. Note that part of
> > my temporary confusion here is probably just that I'm not used to seeing
> > cluster alignment in inode units..
> 
> Aha... now I think I know the source of the confusion (see below):
> 
> > > > but otherwise a one-liner comment couldn't hurt.
> > > 
> > > At the moment the comment says:
> > > 
> > > 	/*
> > > 	 * finobt records have different positioning requirements than inobt
> > > 	 * records: each finobt record must have a corresponding inobt record.
> > > 	 * That is checked in the xref function, so for now we only catch the
> > > 	 * obvious case where the record isn't at all aligned properly.
> > > 	 *
> > > 	 * Note that if a fs block contains more than a single chunk of inodes,
> > > 	 * we will have finobt records only for those chunks containing free
> > > 	 * inodes, and therefore expect chunk alignment of finobt records.
> > > 	 * Otherwise, we expect that the finobt record is aligned to the
> > > 	 * cluster alignment as told by the superblock.
> > > 	 */
> > > 
> > 
> > As opposed to something like:
> > 
> > 	agbno = XFS_AGINO_TO_AGBNO(mp, irec->ir_startino);
> > 	if (agbno & (mp->m_cluster_align - 1))
> > 		...
> > 
> > ... which IMO is self explanatory: make sure the start block of the
> > inode chunk is properly aligned. Am I missing some reason why we can't
> > do that?
> 
> AGINO_TO_AGBNO right-shifts the low bits off ir_startino before the mask
> test, which means we won't catch corruptions such as (ir_startino == 97)
> if we're only examining block numbers derived from inode numbers.
> 
> I'll add an additional comment to the alignment check function to
> explain why we have to perform the alignment checks at inode
> granularity, not block granularity.
> 

Ok, I see. One other quick thought.. do we still lose that information
if we just convert back and forth? I.e., convert agino to agbno/offset,
check alignment etc., convert agbno/offset back to agino and make sure
it matches the startino..?

That aside, yeah.. a couple sentences that explain precisely what you
did above clear this up quite a bit.

Brian

> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > > > Otherwise seems fine:
> > > > > > 
> > > > > > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > > > 
> > > > > I've wondered in recent days if this is even necessary at all -- when
> > > > > we're asked to check the inobt we check the ir_startino alignment of all
> > > > > those records, so really the only thing we need is the existing check
> > > > > that for each finobt record there's also an inobt record with the same
> > > > > ir_startino.  OTOH I guess we shouldn't really assume that the calling
> > > > > process already checked the inobt or that it didn't change between calls.
> > > > > 
> > > > 
> > > > I guess it makes sense to verify the applicable records match, but in
> > > > general I agree that "1. verify the inobt 2. verify the finobt is a
> > > > subset of the inobt" is probably sufficient (and more elegant logic).
> > > > 
> > > > Brian
> > > > 
> > > > > --D
> > > > > 
> > > > > > > +
> > > > > > > +	/* inobt records must be aligned to cluster and inoalignmnt size. */
> > > > > > > +	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
> > > > > > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > > +		return;
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	if (irec->ir_startino & (mp->m_inodes_per_cluster - 1)) {
> > > > > > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > > +		return;
> > > > > > > +	}
> > > > > > > +}
> > > > > > > +
> > > > > > >  /* Scrub an inobt/finobt record. */
> > > > > > >  STATIC int
> > > > > > >  xchk_iallocbt_rec(
> > > > > > > @@ -277,7 +313,6 @@ xchk_iallocbt_rec(
> > > > > > >  	uint64_t			holes;
> > > > > > >  	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
> > > > > > >  	xfs_agino_t			agino;
> > > > > > > -	xfs_agblock_t			agbno;
> > > > > > >  	xfs_extlen_t			len;
> > > > > > >  	int				holecount;
> > > > > > >  	int				i;
> > > > > > > @@ -304,11 +339,9 @@ xchk_iallocbt_rec(
> > > > > > >  		goto out;
> > > > > > >  	}
> > > > > > >  
> > > > > > > -	/* Make sure this record is aligned to cluster and inoalignmnt size. */
> > > > > > > -	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
> > > > > > > -	if ((agbno & (mp->m_cluster_align - 1)) ||
> > > > > > > -	    (agbno & (mp->m_blocks_per_cluster - 1)))
> > > > > > > -		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > > +	xchk_iallocbt_rec_alignment(bs, &irec);
> > > > > > > +	if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > > > > > > +		goto out;
> > > > > > >  
> > > > > > >  	iabt->inodes += irec.ir_count;
> > > > > > >  
> > > > > > > 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux