On Fri, Mar 7, 2008 at 2:39 PM, Marc Bejarano <beej@xxxxxxxxxxxx> wrote: > At 17:52 3/6/2008, Steve Cousins wrote: > >Have you run any memory tests on the machine? > > no, but my suspicions lay elsewhere. could bad memory explain the > right bits ending up in the wrong place on only one half of a mirror? IMHO, not likely unless the page is getting copied before being DMA'd but the HBA. DMA is a form of copying but would have a different "signature" than bad memory (cacheline vs sub-cacheline corruption). If you know that the two RAID mirrors are different, can you compare them block-by-block? First step to solving any data corruption issue is characterizing the corruption: length of wrong data, alignment of wrong data, and if possible, determine if wrong data is "stale" or where it originates. jejb already suggested this. If you can destroy (and later restore) the data on one or more of the disks, you might consider running disktest from: http://test.kernel.org/autotest/ I've parked an SVN snapshot on: http://iou.parisc-linux.org/~grundler/autotest-20080307.tgz See autotest/tests/disktest/ . IIRC this test will tag each 512 byte "sector" it writes to a file and will read back those tags later to verify the sectors made it to media. hth, grant -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html