On 05/21/2010 12:40 PM, MRK wrote: > On 05/21/2010 04:16 AM, Doug Ledford wrote: >> On 05/20/2010 06:38 PM, Neil Brown wrote: >> >>> On Thu, 20 May 2010 17:29:37 -0500 >>> Trey Scarborough<treys@xxxxxxxxxxxxxx> wrote: >>> >>> >>>> Neil Brown wrote: >>>> >>>>> On Thu, 20 May 2010 12:02:23 -0500 >>>>> Trey Scarborough<treys@xxxxxxxxxxxxxx> wrote: >>>>> >>>>> >>>>> >>>>>> I have a raid 5 array with 9 disks and I have a mismatch_cnt that >>>>>> keeps >>>>>> growing. This is causing file corruption on the underlaying file >>>>>> systems >>>>>> as well. I can copy a group of 100 100mb files and then do a >>>>>> md5sum on >>>>>> them and 1-3 will be corrupt. If this is a drive that is bad is there >>>>>> anyway to run a report on the count per drive that these mismatches >>>>>> occur. I have run smarttools test and do not see one drive that >>>>>> stands >>>>>> out to be causing errors. Could something else be causing these >>>>>> errors? >>>>>> >>>>>> >> While a bad drive is certainly a possibility here, this is precisely the >> type of failure scenario that would make me suspect bad RAM, >> motherboard, or CPU. So I wouldn't rule those out as possibilities >> either. >> > > Could the cabling to the drive be causing this? (maybe failing or maybe > it's partly disconnected) > I don't remember at what point Linux is at implementing the checksums > between the controller and the drive. I don't know. I'm not up on the SATA signaling details so I don't know if it uses CRC on the signal, but I suspect it does and a bad cable would cause failed requests. But I wouldn't bet my house on it, so I would ask some SATA gurus. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
Attachment:
signature.asc
Description: OpenPGP digital signature