Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Mon, 10 Mar 2008 10:36:26 -0500

On Fri, 2008-03-07 at 17:40 -0500, Marc Bejarano wrote:
> In another instance of out-of-sync-ness, the bad disk looked as 
> follows.  The bad disk was in a completely different md raid1 
> "device", and if it needs to be said explicitly, was a totally 
> different physical drive.
> 
> b + 0x00000: "header of 16K mysql/innodb page 309713974 followed by
> good data"
> 
> b + 0x03600: **BAD DATA**: "header of 16K mysql/innodb page 
> 309713975", should be at b+0x04000, followed by first 10752 == 21*512 
> bytes of current correct value of page per disk with good copy
> 
> b + 0x06000: correct current last part of page 309713975 in proper
> place.
> 
> This is hard to explain.  It looks like page 309713975 got written 
> out to the proper spot, but then the first 10752 bytes got written 
> out again to the wrong spot?!?

I'm afraid your not going to like this, but this pattern of corruption
is almost completely definitive of a disk problem with head positioning.

The reason is that the block and all lower layers do write out in terms
of what they see as a logical block size (usually 4k, but definitely
whatever the block size of the underlying filesystem you have mysql on).

Seeing an odd number of 512 byte sectors out of position like that (21
in your case) when that number isn't a power of two (which is a linux
logical block size requirement) can't really have come from the kernel,
since we always deal in power of two units of the underlying 512 byte
sectors all the way from block, through md to the low level SCSI driver.

It's still theoretically possible that something went wrong in the
actual HBA, but I'd place most of my money on a disk fault.  The drives
you have, the Seagate 7200.10 were the first to use perpendicular
recording, so it could be they have head positioning errors with the new
technology.  There's also a lot of talk on the internet about
performance issues with the various revisions of their firmware:

http://www.fluffles.net/articles/seagate-AAK-firmware

Just as a matter of interest, what version of firmware do you have?  You
can get this with

hdparm -I /dev/sd<whatever>

I'm afraid the only way to confirm this theory definitively will be with
the destructive disktest from autotest (it was actually constructed to
check for drive head positioning errors), as Grant explained:

> If you can destroy (and later restore) the data on one or more
> of the disks, you might consider running disktest from:
>    http://test.kernel.org/autotest/
> 
> I've parked an SVN snapshot on:
>    http://iou.parisc-linux.org/~grundler/autotest-20080307.tgz
> 
> See autotest/tests/disktest/ . IIRC this test will tag each 512 byte
> "sector" it writes to a file and will read back those tags later to
> verify the sectors made it to media.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html