on 10/15/2007 5:16 AM Centos spake the following:
Thanks every one for help and response.
I just noticed that these errors might be soft error, because only
happens when I overload the
storage with copying simultaneously large files on the same port and
scsi controller, so I was thinking
it should be ECC speed to calculation of the parity or ram shortage.
hardware supposed to take care of ECC erros and also device should
be panic or hang by seeing these error, but device just keep going.
what do you think ?
I have had systems so overloaded that I couldn't log in on an ssh session, but
when the load cleared, there weren't any ECC errors. I still think you have a
hardware problem, and just because it takes a high load now doesn't mean that
it is OK. A faulty timing capacitor on the motherboard can cause all sorts of
corruption in memory, and it will probably deteriorate over time. You need to
methodically test the memory by running memory tests, and then moving ram and
testing again. Or replace the hardware if it is mission critical.
--
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos