Re: [PATCH 2/4] xfs: ensure verifiers are attached to recovered buffers

Brian Foster <bfoster@xxxxxxxxxx> · Wed, 30 Jul 2014 12:30:03 -0400

On Wed, Jul 30, 2014 at 11:48:47AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> Crash testing of CRC enabled filesystems has resulted in a number of
> reports of bad CRCs being detected after the filesystem was mounted.
> Errors such as the following were being seen:
> 
> XFS (sdb3): Mounting V5 Filesystem
> XFS (sdb3): Starting recovery (logdev: internal)
> XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1
> XFS (sdb3): Unmount and run xfs_repair
> XFS (sdb3): First 64 bytes of corrupted metadata buffer:
> ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40  XAGF...........@
> ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01  ..mS..w.........
> ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03  ................
> ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00  ................
> XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 numblks 1
> 
> The errors were typically being seen in AGF, AGI and their related
> btree block buffers some time after log recovery had run. Often it
> wasn't until later subsequent mounts that the problem was
> discovered. The common symptom was a buffer with the correct
> contents, but a CRC and an LSN that matched an older version of the
> contents.
> 
> Some debug added to _xfs_buf_ioapply() indicated that buffers were
> being written without verifiers attached to them from log recovery,
> and Jan Kara isolated the cause to log recovery readahead an dit's
> interactions with buffers that had a more recent LSN on disk than
> the transaction being recovered. In this case, the buffer did not
> get a verifier attached, and os when the second phase of log
> recovery ran and recovered EFIs and unlinked inodes, the buffers
> were modified and written without the verifier running. Hence they
> had up to date contents, but stale LSNs and CRCs.
> 
> Fix it by attaching verifiers to buffers we skip due to future LSN
> values so they don't escape into the buffer cache without the
> correct verifier attached.
> 
> This patch is based on analysis and a patch from Jan Kara.
> 
> cc: <stable@xxxxxxxxxxxxxxx>
> Reported-by: Jan Kara <jack@xxxxxxx>
> Reported-by: Fanael Linithien <fanael4@xxxxxxxxx>
> Reported-by: Grozdan <neutrino8@xxxxxxxxx>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_log_recover.c | 47 +++++++++++++++++++++++++++++++----------------
>  1 file changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index fbc2362..0a015cc 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2126,6 +2126,17 @@ xlog_recover_validate_buf_type(
>  	__uint16_t		magic16;
>  	__uint16_t		magicda;
>  
> +	/*
> +	 * We can only do post recovery validation on items on CRC enabled
> +	 * fielsystems as we need to know when the buffer was written to be able
> +	 * to determine if we should have replayed the item. If we replay old
> +	 * metadata over a newer buffer, then it will enter a temporarily
> +	 * inconsistent state resulting in verification failures. Hence for now
> +	 * just avoid the verification stage for non-crc filesystems
> +	 */
> +	if (!xfs_sb_version_hascrc(&mp->m_sb))
> +		return;
> +

This function has a couple other similar *_hascrc() checks further down
that we should probably kill now. Otherwise, looks Ok...

Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>

>  	magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
>  	magic16 = be16_to_cpu(*(__be16*)bp->b_addr);
>  	magicda = be16_to_cpu(info->magic);
> @@ -2197,10 +2208,6 @@ xlog_recover_validate_buf_type(
>  #endif
>  		break;
>  	case XFS_BLFT_DINO_BUF:
> -		/*
> -		 * we get here with inode allocation buffers, not buffers that
> -		 * track unlinked list changes.
> -		 */
>  		if (magic16 != XFS_DINODE_MAGIC) {
>  			xfs_warn(mp, "Bad INODE block magic!");
>  			ASSERT(0);
> @@ -2388,16 +2395,7 @@ xlog_recover_do_reg_buffer(
>  	/* Shouldn't be any more regions */
>  	ASSERT(i == item->ri_total);
>  
> -	/*
> -	 * We can only do post recovery validation on items on CRC enabled
> -	 * fielsystems as we need to know when the buffer was written to be able
> -	 * to determine if we should have replayed the item. If we replay old
> -	 * metadata over a newer buffer, then it will enter a temporarily
> -	 * inconsistent state resulting in verification failures. Hence for now
> -	 * just avoid the verification stage for non-crc filesystems
> -	 */
> -	if (xfs_sb_version_hascrc(&mp->m_sb))
> -		xlog_recover_validate_buf_type(mp, bp, buf_f);
> +	xlog_recover_validate_buf_type(mp, bp, buf_f);
>  }
>  
>  /*
> @@ -2505,12 +2503,29 @@ xlog_recover_buffer_pass2(
>  	}
>  
>  	/*
> -	 * recover the buffer only if we get an LSN from it and it's less than
> +	 * Recover the buffer only if we get an LSN from it and it's less than
>  	 * the lsn of the transaction we are replaying.
> +	 *
> +	 * Note that we have to be extremely careful of readahead here.
> +	 * Readahead does not attach verfiers to the buffers so if we don't
> +	 * actually do any replay after readahead because of the LSN we found
> +	 * in the buffer if more recent than that current transaction then we
> +	 * need to attach the verifier directly. Failure to do so can lead to
> +	 * future recovery actions (e.g. EFI and unlinked list recovery) can
> +	 * operate on the buffers and they won't get the verifier attached. This
> +	 * can lead to blocks on disk having the correct content but a stale
> +	 * CRC.
> +	 *
> +	 * It is safe to assume these clean buffers are currently up to date.
> +	 * If the buffer is dirtied by a later transaction being replayed, then
> +	 * the verifier will be reset to match whatever recover turns that
> +	 * buffer into.
>  	 */
>  	lsn = xlog_recover_get_buf_lsn(mp, bp);
> -	if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0)
> +	if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
> +		xlog_recover_validate_buf_type(mp, bp, buf_f);
>  		goto out_release;
> +	}
>  
>  	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
>  		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs