Re: xfs corruption

Danny Shavit <danny@xxxxxxxxxxxxxxxxx> · Thu, 3 Sep 2015 17:26:25 +0300

Hi Eric,
Thanks for the prompt response.
Sorry for the missing parts, I was wrongly assuming that everybody knows our environment :-)

More information:
uname -a:  Linux vsa-00000142 3.8.13-030813-generic #201305111843 SMP Sat May 11 22:44:40 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
xfs_repair version 3.1.7

We are using modified xfs. Mainly, added some reporting features and changed discard operation to be aligned with chunk sizes used in our systems.
The modified code resides at  https://github.com/zadarastorage/zadara-xfs-pushback.

We were in a hurry at the time we run xfs_repair with -L. Was not so smart...
Any way, the xfs_dump was taken before running xfs_repair.
We will use the original xfs meta data to run xfs_repair after mount and get back with the results.

Regards,
Danny

On Thu, Sep 3, 2015 at 4:22 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
On 9/3/15 6:09 AM, Danny Shavit wrote:

> Hi Dave,

>

> We couple of more xfs corruption that we would like to share:

On the same box as the one that seemed to be experiencing some

bit-flips in your earlier email?

As a general note: You are not providing enough information for

us to effectively help you.

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

Kernel version?  xfsprogs version?  At a bare minimum...

Your dmesg snippets are edited.  You've provided what you feel is

important, omitting the parts that may actually be important or

informational.

You haven't described the sequence of events that led to these issues.

You haven't made clear what these attachments are; which repair log goes

with which kernel event?

Etc...

> 1. This is an interesting one, since xfs reported corruption but when

> running xfs_repair, no error was found. Attached is the kernel log

> section regarding the corruption (6458). Does xfs_repair explicitly

> read data from the disk? In such case it might be a memory

> corruption. Are you familiar with such cases?

Yes, xfs_repair opens the block device O_DIRECT.

your 6485-kernel.log shows a failure in xfs_allocbt_verify(), right

after the allocation btree is read from disk.  i.e. this is an in-kernel

metadata consistency check that is failing.

It also shows:

kworker/0:1H Tainted: GF       W

So it's tainted:

  2: 'F' if any module was force loaded by "insmod -f", ' ' if all

     modules were loaded normally.

 10: 'W' if a warning has previously been issued by the kernel.

     (Though some warnings may set more specific taint flags.)

You force-loaded a module?  And previous warnings were emitted (though we

can't see them in your edited dmesg).

All bets are off.  If you had included the full dmesg, we might know

more about what's going on, at least.

> 2. xfs corruption occurred suddenly with no apparent external event.

>  Attached are xfs_repair and kernel logs are. Xfs dump can be found

> in: https://zadarastorage-public.s3.amazonaws.com/xfs/82.metadump.gz

Your 6442-82-xfs_repair.log is from an xfs_repair -L, so of course it

is finding corruption, and the output is more or less meaningless

from a triage POV.  Repair said:

> Note that destroying the log may cause corruption -- please attempt a mount

> of the filesystem before doing this.

Why did you run it with -L? Did mount fail? If so how?

dm-82-kernel.log also shows a failing verifier, this time xfs_bmbt_verify,

when reading metadata from disk.

You've truncated other parts, though:

Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.685353] ffff88010ec36000: ea bb 12 3a 5f 44 01 a8 b9 2a 80 10 b3 a7 d5 af  ...:_D...*

......

so there's not a ton to go on, just hints that there is more information

that's not provided.

-Eric

-- 
Regards,
Danny

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs