On Mon, 9 Jul 2007, aix tiger wrote:
Hi Friends I am facing a strange problem on one of my RHEL server which is that this server crashes and restart frequently. This is an HP proliant DL740 and part of RHEL cluster (V4U4). Another HP proliant DL740 is part of that cluster with same version of RHEL OS and cluster but it faces no such problems... In my /var/log/messages , i receive no errors .. in HP ILO messages there is no error mentioned except a message " A critical server error occured before this POST"... I have asked HP hardware engineer to check all hardware possible errors but he says that from diagnostics there are no issues. How can i troubleshoot this problem?? There is no specific timings of this problem , it happens any time ( usually once in aweek is a must )... please advice where to solve this issue?
I had a similar problem with an Oracle RAC cluster. One node rebooted, one didn't. While I've not yet solved the problem, it is because the condition stopped occurring. I set up netconsole (part of netdump) which eventually told me that hangcheck-timer was rebooting my system. I also am running hangwatch (http://people.redhat.com/csnook/hangwatch/) which will run sysrq commands to capture the system state when the system load spikes.
HTH, Barry -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list