Re: xfs corruption

Eric Sandeen <sandeen@xxxxxxxxxxx> · Thu, 3 Sep 2015 08:22:23 -0500

On 9/3/15 6:09 AM, Danny Shavit wrote:
> Hi Dave,
> 
> We couple of more xfs corruption that we would like to share:

On the same box as the one that seemed to be experiencing some
bit-flips in your earlier email?

As a general note: You are not providing enough information for
us to effectively help you.

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

Kernel version?  xfsprogs version?  At a bare minimum...

Your dmesg snippets are edited.  You've provided what you feel is
important, omitting the parts that may actually be important or
informational.

You haven't described the sequence of events that led to these issues.

You haven't made clear what these attachments are; which repair log goes
with which kernel event?

Etc...

> 1. This is an interesting one, since xfs reported corruption but when
> running xfs_repair, no error was found. Attached is the kernel log
> section regarding the corruption (6458). Does xfs_repair explicitly
> read data from the disk? In such case it might be a memory
> corruption. Are you familiar with such cases?

Yes, xfs_repair opens the block device O_DIRECT.

your 6485-kernel.log shows a failure in xfs_allocbt_verify(), right
after the allocation btree is read from disk.  i.e. this is an in-kernel
metadata consistency check that is failing.

It also shows:

kworker/0:1H Tainted: GF       W 

So it's tainted:

  2: 'F' if any module was force loaded by "insmod -f", ' ' if all
     modules were loaded normally.

 10: 'W' if a warning has previously been issued by the kernel.
     (Though some warnings may set more specific taint flags.)

You force-loaded a module?  And previous warnings were emitted (though we
can't see them in your edited dmesg).  
All bets are off.  If you had included the full dmesg, we might know 
more about what's going on, at least.

> 2. xfs corruption occurred suddenly with no apparent external event.
>  Attached are xfs_repair and kernel logs are. Xfs dump can be found
> in: https://zadarastorage-public.s3.amazonaws.com/xfs/82.metadump.gz

Your 6442-82-xfs_repair.log is from an xfs_repair -L, so of course it
is finding corruption, and the output is more or less meaningless
from a triage POV.  Repair said:

> Note that destroying the log may cause corruption -- please attempt a mount
> of the filesystem before doing this.

Why did you run it with -L? Did mount fail? If so how?

dm-82-kernel.log also shows a failing verifier, this time xfs_bmbt_verify,
when reading metadata from disk.

You've truncated other parts, though:

Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.685353] ffff88010ec36000: ea bb 12 3a 5f 44 01 a8 b9 2a 80 10 b3 a7 d5 af  ...:_D...*
......

so there's not a ton to go on, just hints that there is more information
that's not provided.

-Eric

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs