Silent Corruption on RAID5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm experiencing silent data corruption on my RAID 5 set of four 400GB SATA disks.

I first had the problem a couple of weeks ago and thought it was related to using reiserfs on my system because I hadn't used it before but have another perfectly functional RAID 5 array running ext3 after lots of testing I find the problem happens with ext3 on the array as well, and after even more testing I find that the problem only occurs on the array not the individual hard disks.

My test consists of making a ~10GB file of zeros, then checking it for non-zero bytes, I've also tried creating the file of zeros on a functional array and copying it across with the same results.

dd bs=1024 count=10000k if=/dev/zero of=./10GB.tst
od -t x1 s0/10GB.tst

These commands give me one row of zeros on my other RAID 5 set on the same box and on each individual hard disk in the array when I put ext3 on them all to see if one was faulty but when they are in RAID the od spouts lots of non-zeros at me.

<snip>
21524747740 00 00 00 00 00 00 00 00 00 00 00 00 00 50 5c 36
21524747760 00 10 00 00 00 00 a7 23 00 10 00 80 00 00 00 00
21524750000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
21525147740 00 00 00 00 00 00 00 00 00 00 00 00 00 50 5c 36
21525147760 00 10 00 00 00 00 a7 23 00 10 00 80 00 00 00 00
21525150000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<snip>

There is a fair bit of that and the last time I ran it I got an I/O array and the mount went into read only mode.

I figure the problem is either with the Silicon Image 3114 hardware or driver that supports the array or the RAID subsystem but like I mentioned earlier my other RAID 5 set of three 120GB drives on two IDE controllers works fine.

I'm running Debian sarge with a 2.6.15-1 kernel, it has an Athlon XP2200, 1GB of RAM, Asus A7N8X-Deluxe motherboard, 2 Maxtor IDE controllers, one Silicon Image 3114 PCI adapter, along with the on-board Silicon Image 3112 controller - 2x 10GB IDE disks and a DVD ROM drive on the on-board IDE controller, 3x 120GB Seagate hard disks on the PCI IDE adapters, 2x 80GB Seagate disks on the on-board SilImg 3112 controller and finally 4x 400GB disks on the SilImg 3114 PCI adapter.


biggs:/mnt/test/s0# uname -a
Linux biggs 2.6.15.1.060121 #1 Sat Jan 21 17:01:30 GMT 2006 i686 GNU/Linux
biggs:/mnt/test/s0# cat /proc/mdstat
Personalities : [raid1] [raid5]
md2 : active raid5 sdd1[4] sdc1[2] sdb1[1] sda1[0]
      1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
[=>...................] recovery = 8.3% (32500608/390708736) finish=253.3min speed=23564K/sec

md1 : active raid5 hdg1[0] hde1[2] hdi1[1]
      234436352 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid1 hdb2[0] hda2[1]
      9502336 blocks [2/2] [UU]

unused devices: <none>

(Note: md2 is the array with problems, and I've done the tests when its been fully synced with the same results)


So, does anyone have any suggestions or tests I could perform to narrow down where my problem is?

Regards,

Michael Barnwell.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux