Re: [PATCH v2 06/22] xfs: add a repair helper to reset superblock counters

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Tue, 29 May 2018 15:07:16 -0700

On Tue, May 29, 2018 at 01:28:10PM +1000, Dave Chinner wrote:
> On Thu, May 17, 2018 at 08:56:23PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > Add a helper function to reset the superblock inode and block counters.
> > The AG rebuilding functions will need these to adjust the counts if they
> > need to change as a part of recovering from corruption.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > Reviewed-by: Allison Henderson <allison.henderson@xxxxxxxxxx>
> > ---
> > v2: improve documentation
> > ---
> >  fs/xfs/scrub/repair.c |   89 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/repair.h |    7 ++++
> >  fs/xfs/scrub/scrub.c  |    2 +
> >  fs/xfs/scrub/scrub.h  |    1 +
> >  4 files changed, 99 insertions(+)
> > 
> > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > index 877488ce4bc8..4b95a15c0bd0 100644
> > --- a/fs/xfs/scrub/repair.c
> > +++ b/fs/xfs/scrub/repair.c
> > @@ -1026,3 +1026,92 @@ xfs_repair_find_ag_btree_roots(
> >  
> >  	return error;
> >  }
> > +
> > +/*
> > + * Reset the superblock counters.
> > + *
> > + * If a repair function changes the inode or free block counters, it must set
> > + * reset_counters to push this function to reset the global counters.  Repair
> > + * functions are responsible for resetting all other in-core state.  This
> > + * function runs outside of transaction context after the repair context has
> > + * been torn down, so if there's further filesystem corruption we'll error out
> > + * to userspace and give userspace a chance to call back to fix the further
> > + * errors.
> > + */
> > +int
> > +xfs_repair_reset_counters(
> > +	struct xfs_mount	*mp)
> > +{
> > +	struct xfs_buf		*agi_bp;
> > +	struct xfs_buf		*agf_bp;
> > +	struct xfs_agi		*agi;
> > +	struct xfs_agf		*agf;
> > +	xfs_agnumber_t		agno;
> > +	xfs_ino_t		icount = 0;
> > +	xfs_ino_t		ifree = 0;
> > +	xfs_filblks_t		fdblocks = 0;
> > +	int64_t			delta_icount;
> > +	int64_t			delta_ifree;
> > +	int64_t			delta_fdblocks;
> > +	int			error;
> > +
> > +	trace_xfs_repair_reset_counters(mp);
> > +
> > +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> > +		/* Count all the inodes... */
> > +		error = xfs_ialloc_read_agi(mp, NULL, agno, &agi_bp);
> > +		if (error)
> > +			return error;
> > +		agi = XFS_BUF_TO_AGI(agi_bp);
> > +		icount += be32_to_cpu(agi->agi_count);
> > +		ifree += be32_to_cpu(agi->agi_freecount);
> > +		xfs_buf_relse(agi_bp);
> > +
> > +		/* Add up the free/freelist/bnobt/cntbt blocks... */
> > +		error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agf_bp);
> > +		if (error)
> > +			return error;
> > +		if (!agf_bp)
> > +			return -ENOMEM;
> > +		agf = XFS_BUF_TO_AGF(agf_bp);
> > +		fdblocks += be32_to_cpu(agf->agf_freeblks);
> > +		fdblocks += be32_to_cpu(agf->agf_flcount);
> > +		fdblocks += be32_to_cpu(agf->agf_btreeblks);
> > +		xfs_buf_relse(agf_bp);
> > +	}
> > +
> > +	/*
> > +	 * Reinitialize the counters.  The on-disk and in-core counters differ
> > +	 * by the number of inodes/blocks reserved by the admin, the per-AG
> > +	 * reservation, and any transactions in progress, so we have to
> > +	 * account for that.  First we take the sb lock and update its
> > +	 * counters...
> > +	 */
> > +	spin_lock(&mp->m_sb_lock);
> > +	delta_icount = (int64_t)mp->m_sb.sb_icount - icount;
> > +	delta_ifree = (int64_t)mp->m_sb.sb_ifree - ifree;
> > +	delta_fdblocks = (int64_t)mp->m_sb.sb_fdblocks - fdblocks;
> > +	mp->m_sb.sb_icount = icount;
> > +	mp->m_sb.sb_ifree = ifree;
> > +	mp->m_sb.sb_fdblocks = fdblocks;
> > +	spin_unlock(&mp->m_sb_lock);
> 
> This seems racy to me ? i.e. the per-ag counters can change while
> we are summing them, and once we've summed them then sb counters
> can change while we are waiting for the m_sb_lock. It's looks to me
> like the summed per-ag counters are not in any way coherent
> wit the superblock or the in-core per-CPU counters, so I'm
> struggling to understand why this is safe?

Hmm, yes, I think this is racy too.  The purpose of this code is to
recompute the global counters from the AG counters after any operation
that modifies anything that would affect the icount/ifreecount/fdblocks
counters...

> We can do this sort of summation at mount time (in
> xfs_initialize_perag_data()) because the filesystem is running
> single threaded while the summation is taking place and so nothing
> is changing during th summation. The filesystem is active in this
> case, so I don't think we can do the same thing here.

...however, you're correct to point out that the fs must be quiesced
before we can actually do this.  In other words, I think the filesystem
has to be completely frozen before we can do this.  Perhaps it's better
to have the per-ag rebuilders fix only the per-ag counters and leave the
global counters alone.  Then add a new scrubber that checks the summary
counters and fixes them if necessary.

> Also, it brought a question to mind because I haven't clearly noted
> it happening yet: when do the xfs_perag counters get corrected? And
> if they are already correct, why not just iterate the perag
> counters?

The xfs_perag counters are updated by the AGF/AGI/inobt rebuild code.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html