Re: [PATCH 05/22] xfs: scrub in-memory metadata buffers

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 24 Jul 2017 11:43:27 +1000

On Thu, Jul 20, 2017 at 09:39:00PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> Call the verifier function for all in-memory metadata buffers, looking
> for memory corruption either due to bad memory or coding bugs.

How does this fit into the bigger picture? We can't do an exhaustive
search of the in memory buffer cache, because access is racy w.r.t.
the life cycle of in memory buffers.

Also, if we are doing a full scrub, we're going to hit and then
check the cached in-memory buffers anyway, so I'm missing the
context that explains why this code is necessary.

>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/metabufs.c b/fs/xfs/scrub/metabufs.c
> new file mode 100644
> index 0000000..63faaa6
> --- /dev/null
> +++ b/fs/xfs/scrub/metabufs.c
> @@ -0,0 +1,177 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "scrub/common.h"
> +
> +/* We only iterate buffers one by one, so we don't need any setup. */
> +int
> +xfs_scrub_setup_metabufs(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return 0;
> +}
> +
> +#define XFS_SCRUB_METABUFS_TOO_MANY_RETRIES	10
> +struct xfs_scrub_metabufs_info {
> +	struct xfs_scrub_context	*sc;
> +	unsigned int			retries;
> +};

So do we get 10 retries per buffer, or 10 retries across an entire
scan?

> +/* In-memory buffer corruption. */
> +
> +#define XFS_SCRUB_BUF_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(smi->sc, \
> +			xfs_daddr_to_agno(smi->sc->mp, bp->b_bn), \
> +			xfs_daddr_to_agbno(smi->sc->mp, bp->b_bn), "buf", \
> +			&error, label)

Nested macros - yuck!

> +STATIC int
> +xfs_scrub_metabufs_scrub_buf(
> +	struct xfs_scrub_metabufs_info	*smi,
> +	struct xfs_buf			*bp)
> +{
> +	int				olderror;
> +	int				error = 0;
> +
> +	/*
> +	 * We hold the rcu lock during the rhashtable walk, so we can't risk
> +	 * having the log forced due to a stale buffer by xfs_buf_lock.
> +	 */
> +	if (bp->b_flags & XBF_STALE)
> +		return 0;
> +
> +	atomic_inc(&bp->b_hold);

This looks wrong. I think it can race with reclaim because we don't
hold the pag->pag_buf_lock. i.e.  xfs_buf_rele() does this:

	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);

to prevent lookups - which are done under the pag->pag_buf_lock -
from finding the buffer while it has a zero hold count and may be
removed from the cache and freed.

Further, if we are going to iterate the cache, I'd much prefer the
iteration code to be in fs/xfs/xfs_buf.c - nothing outside that file
should be touching core buffer cache structures....

FWIW, the LRU walks already handle this problem by the fact the LRU
owns a hold count on the buffer. So it may be better to do this via
a LRU walk rather than a hashtable walk....

[snip]

> +	olderror = bp->b_error;
> +	if (bp->b_fspriv)
> +		bp->b_ops->verify_write(bp);

Should we be recalculating the CRC on buffers we aren't about to 
be writing to disk? Should we be verifying a buffer that has a
non-zero error value on it?

> +	else
> +		bp->b_ops->verify_read(bp);
> +	error = bp->b_error;
> +	bp->b_error = olderror;
> +
> +	/* Mark any corruption errors we might find. */
> +	XFS_SCRUB_BUF_OP_ERROR_GOTO(out_unlock);

Ah, what? Why does this need a goto? And why doesn't it report the
error that was found? (bloody macros!).

> +out_unlock:
> +	xfs_buf_unlock(bp);
> +out_dec:
> +	atomic_dec(&bp->b_hold);
> +	return error;
> +}
> +#undef XFS_SCRUB_BUF_OP_ERROR_GOTO

Oh, that's the macro defined above the function. Which I paid little
attention to other than it called another macro. Now I realise that
it (ab)uses local variables without them being passed into the
macro. Yup, another reason we need to get rid of the macros from
this code....

> +	struct xfs_scrub_metabufs_info	*smi,
> +	struct rhashtable_iter		*iter)
> +{
> +	struct xfs_buf			*bp;
> +	int				error = 0;
> +
> +	do {
> +		if (xfs_scrub_should_terminate(&error))
> +			break;
> +
> +		bp = rhashtable_walk_next(iter);
> +		if (IS_ERR(bp))
> +			return PTR_ERR(bp);
> +		else if (bp == NULL)
> +			return 0;
> +
> +		error = xfs_scrub_metabufs_scrub_buf(smi, bp);
> +	} while (error != 0);
> +
> +	return error;
> +}
> +
> +/* Try to walk the buffers in this AG in order to scrub them. */
> +int
> +xfs_scrub_metabufs(

Ah, please put an "_ag_" in this so it's clear it's only scrubbing a
single AG. This is hidden deep inside the scrub context, so it took
me a little bit of back tracking to understand that this wasn't a
global scan....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html