Debugging a strange array corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



G'day all,

I have a 10 x 1TB drive RAID-6 here. It's been great for ages, but recently I've seen nasty random corruption across the entire array that I can not pin down.

The machine also has a number of RAID-1 and a RAID-5 which are all behaving perfectly.

The machine has 16GB of RAM, so all my read tests are done with dd bs=1G count=20 to make sure I'm actually hitting the disk somewhere.

The array is partitioned into three approximately equal partitions.

If I do something like -

for i in `seq 3` ; do dd if=/dev/md0p1 bs=1G count=20 | md5sum ; done

- I get three completely different checksums

The filesystems are unmounted and the array is idle.

I've run the same test individually on all 10 disks in the array and they all appear to give consistent data. Reading anything from the array gives me mostly correct data with intermittent garbage.

I've tried both a 2.6.36.[12] kernel, and I'm currently running 2.6.37-rc5-git3 with the same odd results.

All the disks pass long SMART tests. They all checksum correctly from end to end with repeated sequential runs.

No libata errors in the logs.

The drives are all on separate channels. 8 are on a pair of Marvell 88SX7042 controllers and 2 are on a SIL3132. This has occurred since I upgraded the mainboard (and kernel at the same time - nothing like throwing more variables in the mix) and its effects were subtle enough that I missed them until it had successfully rotated out all of my good backups with broken data. Lesson learned.

I'm stumped and I don't even know where to begin. I've never seen something like this happen without a bad disk, controller or cable and they are easy to diagnose.

Regards,
--
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux