Turns out this is indeed DMA corruption that only happens under high load, I guessing that raid1 must just do more DMA, so it shows up more often there. I can turn the corruption on and off by starting other high-ish bandwidth processes on the machine (backing up to a remote server, etc). Thanks for all the suggestions, Moses On Sat, 25 Feb 2006, Moses Leslie wrote: > Hi, > > I have a machine that currently has 4 drives in it (currently running > 2.6.15.4). The first two drives are on the onboard SATA controller (VIA) > in a RAID-1. I haven't had any issues with these. > > The other two drives were added recently, along with an SiL PCI SATA card > to put them on. lspci reports this card as: > > 0000:00:0a.0 Unknown mass storage controller: Silicon Image, Inc. > (formerly CMD Technology Inc) SiI 3112 [SATALink/SATARaid] Serial ATA > Controller (rev 02) > > I initially used mdadm to create a new RAID1 of the two new drives, and > added them into the LVM group that the other ones were in to expand the > drive, but pretty quickly noticed (via rsync -c) that all new files were > corrupted. > > I've since pulled the 2nd set of drives out of the LVM to test. It's only > when using a RAID-1 that I get occasionaly corruption. I split the drives > (each 300GB) into 4 75GB partitions each, and created 3 md devices. One > 75GB raid1, one 150GB raid0, and 1 225GB raid5. > > I used a script that newfs'd each one, dd'd multiple copies of files (one > run with a 1GB, one with 3GB, one with 6GB), md5'd those files, then > umounted. > > At least once in each test run, there was a file with the wrong checksum > when on the RAID-1 part of the test. > > After completing all the tests, I redid the md devices such that none > of them used any of the same partitions that they had used in the first > test (IE the RAID1 was sda1 and sdb1 in the first one, and was sda4 and > sdb4 in the second one). > > I also did the same test using each of the regular partitions as well > (sda1-4 and sdb1-4). > > I was never able to duplicate any corruption any other time than with the > RAID1. > > There's never any error messages in dmesg or syslog. > > Is there anything I can do to help track down where the problem is? > > Thanks! > > Moses > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html