Re: Weird XFS Corruption Error

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 25 Jan 2014 08:52:57 +1100

On Fri, Jan 24, 2014 at 08:56:32AM +0100, Sascha Askani wrote:
> Hi Dave, 
> 
> thanks for your reply and I’m sorry for the delayed answer…
> 
> Am 23.01.2014 um 00:31 schrieb Dave Chinner <david@xxxxxxxxxxxxx>:
> 
> > On Wed, Jan 22, 2014 at 05:09:10PM +0100, Sascha Askani wrote:
> > 
> > So, an inode extent map btree block failed verification for some
> > reason. Hmmm - there should have been 4 lines of hexdump output
> > there as well. Can you post that as well? Or have you modified
> > /proc/sys/fs/xfs/error_level to have a value of 0 so it is not
> > emitted?
> > 
> 
> /proc/sys/fs/xfs/error_level is set to 3, sorry for not including this in my original post, the Hexdump is pretty „boring“ (or interesting, depending on your point of view):
> 
> [964197.435322] ffff881f8e59b000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [964197.862037] ffff881f8e59b010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [964198.288694] ffff881f8e59b020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [964198.712093] ffff881f8e59b030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

Yeah, that confirms what I suspected - the buffer has been
overwritten with zeros. That tends to imply *something* has zeroed
the start of the block device, and that's the cause of all the
problems.

> > Oh, wow. Ok, if the primary superblock is gone, along with metadata
> > in the first few blocks of the filesystem, then something has
> > overwritten the start of the block device the filesystem is on.
> > 
> >> 2. mounted the filesystem, which gave me a „Structure needs cleaning“ after a couple of seconds
> >> 3. tried mounting again for good measure, same error „Structure needs cleaning“
> > 
> > Right - the kernel can't read a valid superlock, either.
> 
> Just seen this messages in the log which were emitted when trying to mount the FS:
> 
> [964606.038733] XFS (dm-8): metadata I/O error: block 0x200 ("xlog_recover_do..(read#2)") error 117 numblks 16
> [964606.515048] XFS (dm-8): log mount/recovery failed: error 117
> [964606.515386] XFS (dm-8): log mount failed

Yup, that's trying to read an inode cluster. It's also right near
the start of the filesystem (0x200 * 512 bytes = 256k into the
filesystem) So log recovery is trying to replay an inode change and
finding the inodes that underly the change in the log are corrupt.

This really looks like something outside the filesystem caused the
problem. It's probably too late to find out what caused it either,
but I'd be checking with your HW vendor(s) about known problems with
their hardware/firmware....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs