Re: ECC RAM Error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]




the interesting thing is I only see these ECC errors when I am writting data to this box, and no error shows up when I am reading data from it, so if it was corrupted Memory or controller
those errors should show up even when I am reading them.

am I missing some thing here ?


Peter Arremann wrote:
On Thursday 11 October 2007, Centos wrote:
The ECC errors only happens when I am transferring data from other
storage to this one that we get error.
it only happens when it is writing data to it.

ECC errors can happen anywhere. It can be that the data is corrupted while it is transmitted to the storage device. Or the data can degrade while stored. And of course, on the transmission from the storage you have another chance to screw it up.

Problem is, in almost all cases, you won't see those errors until you read the data. The memory controller will then perform the ECC checksum and see that the data that was returned is bad. What happens then depends on what type of memory and memory controller you have. Simple (old) x86 setups will correct single bit errors and report double bit errors as uncorrectable. If you happen to have 3 bits that changed in the same dataword, ECC will actually screw you up worse - it will see it as a single bit error and correct the wrong way. That way you get corrupt data and a soft error. Newer, more complex x86 configs and most proprietary unix boxes protect against that by using fancier ECC algorithms, memory raid and things like that. Anyway - ECC errors to me mean that I need to trigger a failover and get off the box asap. There is no ECC algorithm and hardware setup out there that does the right thing every single time. If you don't have a failover, see if you can take the system down now, remove the offending dimm/bank and run with the remaining ram until you get replacements.
Peter.
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux