Re: [PATCH] xfs: limit superblock corruption errors to probable corruption

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 30 Jan 2014 15:26:21 -0500

On 01/29/2014 12:11 AM, Eric Sandeen wrote:
> Today, if
> 
> xfs_sb_read_verify
>   xfs_sb_verify
>     xfs_mount_validate_sb
> 
> detects superblock corruption, it'll be extremely noisy, dumping
> 2 stacks, 2 hexdumps, etc.
> 
> This is because we call XFS_CORRUPTION_ERROR in xfs_mount_validate_sb
> as well as in xfs_sb_read_verify.
> 
> Also, *any* errors in xfs_mount_validate_sb which are not corruption
> per se; things like too-big-blocksize, bad version, bad magic, v1 dirs,
> rw-incompat etc - things which do not return EFSCORRUPTED - will
> still do the whole XFS_CORRUPTION_ERROR spew when xfs_sb_read_verify
> sees any error at all.  And it suggests to the user that they 
> should run xfs_repair, even if the root cause of the mount failure
> is a simple incompatibility.
> 
> I'll submit that the probably-not-corrupted errors don't warrant
> this much noise, so this patch removes the high-level
> XFS_CORRUPTION_ERROR which was firing for every error return
> except EWRONGFS.
> 
> It also adds one to the path which detects a failed checksum.
> 
> The idea is, if it's really _corruption_ we can call
> XFS_CORRUPTION_ERROR at the point of detection.  More benign
> incompatibilities can do a little printk & fail the mount without
> so much drama.
> 
> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> ---
> 
> I could see an argument where we might still want the hexdump
> for things like bad magic - ok, just what *was* the magic?  But
> I think we do need to reserve the oops-mimicing-backtraces for
> the most severe problems.  Discuss.  ;)
> 

This seems pretty reasonable to me, particularly if pretty much any
error via the xfs_sb_verify() path dumps corruption noise...

> diff --git a/fs/xfs/xfs_sb.c b/fs/xfs/xfs_sb.c
> index 511cce9..b575317 100644
> --- a/fs/xfs/xfs_sb.c
> +++ b/fs/xfs/xfs_sb.c
> @@ -617,6 +617,8 @@ xfs_sb_read_verify(
>  			/* Only fail bad secondaries on a known V5 filesystem */
>  			if (bp->b_bn != XFS_SB_DADDR &&
>  			    xfs_sb_version_hascrc(&mp->m_sb)) {
> +				XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
> +						     mp, bp->b_addr);
>  				error = EFSCORRUPTED;
>  				goto out_error;
>  			}
> @@ -625,12 +627,8 @@ xfs_sb_read_verify(
>  	error = xfs_sb_verify(bp, true);
>  
>  out_error:
> -	if (error) {
> -		if (error != EWRONGFS)
> -			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
> -					     mp, bp->b_addr);
> +	if (error)
>  		xfs_buf_ioerror(bp, error);
> -	}
>  }

... but why not leave the corruption output here in out_error, change
the check to (error == EFSCORRUPTED) and remove the now duplicate
corruption message in xfs_mount_validate_sb() (or replace it with a
warn/notice message)? This would catch the other EFSCORRUPTED returns in
a consistent manner, including another potential duplicate in the write
verifier. I guess we'd lose a little specificity between the crc failure
and sb validation, but we could add a warn/notice for the former too.

Brian

>  
>  /*
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs