Re: how to debug hardware lockups?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On Nov 18, 2008, at 6:05 PM, Les Mikesell <lesmikesell@xxxxxxxxx> wrote:

nate wrote:
Les Mikesell wrote:
Yes, apparently RAM errors can be subtle and only appear when certain
adjacent bit patterns are stored - or when the moon is in a certain
phase or something.
Don't forget cosmic rays
http://adsabs.harvard.edu/abs/1978ITNS...25.1166P

Yeah, but those don't stop when you replace the faulty RAM... Mine did, but the errors committed to disk kept randomly re-appearing mysteriously as the reads from the RAID1 alternated afterwards.

Ah, memory mapped files, another very good reason to use ECC with large memory machines.

Also if you identify bad memory and use software RAID1, it's better to break the mirror, fsck and fix, then rebuild the mirror as there is no data integrity test on RAID1.

-Ross
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux