Re: [PATCH 5/6] xfs: add xfs_verifier_error()

Eric Sandeen <sandeen@xxxxxxxxxxx> · Mon, 10 Feb 2014 08:52:30 -0600

On 2/10/14, 5:10 AM, Dave Chinner wrote:
> On Sun, Feb 09, 2014 at 10:16:20PM -0600, Eric Sandeen wrote:
>> On 2/9/14, 9:43 PM, Dave Chinner wrote:
>>> On Sun, Feb 09, 2014 at 08:33:49PM -0600, Eric Sandeen wrote:
>>>> We want to distinguish between corruption, CRC errors,
>>>> etc.  In addition, the full stack trace on verifier errors
>>>> seems less than helpful; it looks more like an oops than
>>>> corruption.  
>>>>
>>>> Create a new function to specifically alert the user to
>>>> verifier errors, which can differentiate between
>>>> EFSCORRUPTED and CRC mismatches.  It doesn't dump stack
>>>> unless the xfs error level is turned up high.
>>>>
>>>> Define a new error message (EFSBADCRC) to clearly identify
>>>> CRC errors.  (Defined to EILSEQ, bad byte sequence)
>>>>
>>>> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
>>>> ---
>>>>  fs/xfs/xfs_error.c |   22 ++++++++++++++++++++++
>>>>  fs/xfs/xfs_error.h |    3 +++
>>>>  fs/xfs/xfs_linux.h |    1 +
>>>>  3 files changed, 26 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
>>>> index 9995b80..08d76f4 100644
>>>> --- a/fs/xfs/xfs_error.c
>>>> +++ b/fs/xfs/xfs_error.c
>>>> @@ -178,3 +178,25 @@ xfs_corruption_error(
>>>>  	xfs_error_report(tag, level, mp, filename, linenum, ra);
>>>>  	xfs_alert(mp, "Corruption detected. Unmount and run xfs_repair");
>>>>  }
>>>> +
>>>> +/*
>>>> + * Warnings specifically for verifier errors.  Differentiate CRC vs. invalid
>>>> + * values, and omit the stack trace unless the error level is tuned high.
>>>> + */
>>>> +void
>>>> +__xfs_verifier_error(
>>>> +	const char		*func,
>>>> +	struct xfs_buf		*bp)
>>>> +{
>>>> +	struct xfs_mount *mp = bp->b_target->bt_mount;
>>>> +
>>>> +	xfs_alert(mp,
>>>> +"%sCorruption detected in %s, block 0x%llx. Unmount and run xfs_repair",
>>>> +		  bp->b_error == EFSBADCRC ? "CRC " : "", func, bp->b_bn);
>>>
>>> Perhaps if we do this:
>>>
>>> 	xfs_alert(mp,
>>> "Metadata %s detected at %pF, block 0x%llx. Unmount and run xfs_repair",
>>> 		  bp->b_error == EFSBADCRC ? "CRC error"
>>> 					   : "corruption", _RET_IP_, bp->b_bn);
>>>
>>> We'll get a symbol of the form caller_name+0xoffset similar to a
>>> stack dump. That way if we have multiple calls to a
>>> xfs_verifier_error() inside a single function we get something that
>>> tells us which call detected the error...
>>
>> Hm, but the point of the switch based on error nrs was to require only
>> one call in each ->verifier, and ...
> 
> Right, that's the current usage of it because we are simply
> returning true/false from the checking code. Determining the exact
> error is the report is much more useful - let's not lose sight of
> the end goal....
> 
>>> Also, the use of _RET_IP_ gets rid of the need for the wrapper
>>> macro....
>>
>> 0x${SPLAT} is a lot less useful than i.e. "xfs_agi_read_verify"
> 
> Note the format string I used: "%pF". That decodes the _RET_IP_
> into the function name and offset from the start of the function.
> i.e. it returns xfs_agi_read_verify+0x<splat>.

I forgot that it did this, TBH.  Ok, I'll rethink things a bit.

(although with multiple failure points in a verifier, +0x4a vs +0x5b
will still require some digging; a line number might be nice, but
then we'd need a wrapper again)

Thanks,
-Eric

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs