Re: raid 5 mismatch_cnt errors

Doug Ledford <dledford@xxxxxxxxxx> · Wed, 26 May 2010 11:49:52 -0400

On 05/26/2010 11:07 AM, Bill Davidsen wrote:
> Doug Ledford wrote:
>> On 05/20/2010 06:38 PM, Neil Brown wrote:
>>  
>>> On Thu, 20 May 2010 17:29:37 -0500
>>> Trey Scarborough <treys@xxxxxxxxxxxxxx> wrote:
>>>
>>>    
>>>> Neil Brown wrote:
>>>>      
>>>>> On Thu, 20 May 2010 12:02:23 -0500
>>>>> Trey Scarborough <treys@xxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>          
>>>>>> I have a raid 5 array with 9 disks and I have a mismatch_cnt that
>>>>>> keeps growing. This is causing file corruption on the underlaying
>>>>>> file systems as well.  I can copy a group of 100 100mb files and
>>>>>> then do a md5sum on them and 1-3 will be corrupt. If this is a
>>>>>> drive that is bad is there anyway to run a report on the count per
>>>>>> drive that these mismatches occur. I have run smarttools test and
>>>>>> do not see one drive that stands out to be causing errors. Could
>>>>>> something else be causing these errors?
>>>>>>               
>>
>> While a bad drive is certainly a possibility here, this is precisely the
>> type of failure scenario that would make me suspect bad RAM,
>> motherboard, or CPU.  So I wouldn't rule those out as possibilities
>> either.
>>   
> 
> I have the same thought, I would remove half the RAM from the system and
> test again, then swap to the "other" half and repeat. Of course running
> memtest first is a good idea, but I have seen failures which only happen
> on disk access.

Indeed, I've seen lots of failures that only happen with disk access and
not with memory testers.  Hence why I have a shell script on my web page
in my sig that uses disk access to test memory.

> If the system is O/C obviously the first step is to cut the speed back...
> 

-- 
Doug Ledford <dledford@xxxxxxxxxx>
              GPG KeyID: CFBFF194
	      http://people.redhat.com/dledford

Infiniband specific RPMs available at
	      http://people.redhat.com/dledford/Infiniband

Attachment:
signature.asc

Description: OpenPGP digital signature