Re: 8-15 TB storage: any recommendations?

Les Mikesell <lesmikesell@xxxxxxxxx> · Wed, 13 Jan 2010 07:41:38 -0600

Christopher Chan wrote:
>>>> Funny you should mention software RAID1... I've seen two instances of that 
>>> getting silently out-of-sync and royally screwing things up beyond all 
>>> repair.
>>>
>>> Maybe this thread has gone on long enough now?
>>>
>> Not yet :)
>>
>> Please tell more about your hardware and software. What distro? What
>> kernel? What disk controller? What disks?
>>
>> I'm interested in this because I have never seen Linux software MD RAID1
>> failures like this, but some people keep telling they happen frequently..
> 
> It could be like Les said - bad RAM. I certainly have not encountered 
> this sort of error on a md raid1 array.
> 
>> I'm just wondering why I'm not seeing these failures, or if I've just
>> been lucky so far..
>>
> 
> Yeah, lucky you've not got bad RAM that passed POSTing and at the same 
> time did not bring your system down on you right from the start or 
> rendered it unstable.

On the machine where I had the problem I had to run memtest86 more than a day to 
finally catch it.  Then after replacing the RAM and fsck'ing the volume, I still 
had mysterious problems about once a month until I realized that the disks are 
accessed alternately and the fsck pass didn't catch everything.  I forget the 
commands to compare and fix the mirroring, but they worked - and I think the 
centos 5.4 update does that periodically as a cron job now.  The other worry is 
that when one drive dies, you might have unreadable spots in normally unused 
areas of the mirror since this will keep a rebuild from working - but the cron 
job should detect those too if you notice the results.

-- 
   Les Mikesell
    lesmikesell@xxxxxxxxx

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos