Hi, I plan to implement NTP so that both servers time synchronized. How can I look for the failover cause? I already graph sar data and no peak usage on the time when db1svr was fenced by db2svr. What file (and what specific message) that I should look to know the root cause of this failover. Thank you. Regards, Panji On Fri, Nov 9, 2012 at 10:40 AM, Yu <songyu555@xxxxxxxxx> wrote: > Regardless what was the root cause you find. Cluster requires Ntp service to ensure all nodes have time synchronized. So you have to fix this 5 mins difference now. > > Regards > Yu > > On 09/11/2012, at 11:47, Muhammad Panji <sumodirjo@xxxxxxxxx> wrote: > >> Dear All, >> I have an oracle cluster on RHEL 6.2 with 2 servers. Several days ago >> the service was failover from node1 to node2. From /var/log/messages >> on node2 I only see this message : >> >> ... >> Oct 23 12:54:19 db2svr corosync[4142]: [TOTEM ] A processor failed, >> forming new configuration. >> Oct 23 12:54:21 db2svr corosync[4142]: [QUORUM] Members[1]: 2 >> Oct 23 12:54:21 db2svr corosync[4142]: [TOTEM ] A processor joined >> or left the membership and a new membership was formed. >> Oct 23 12:54:21 db2svr kernel: dlm: closing connection to node 1 >> Oct 23 12:54:21 db2svr rgmanager[5327]: State change: clu1 DOWN >> Oct 23 12:54:21 db2svr fenced[4193]: fencing node clu1 >> ... >> >> Googling this message " [TOTEM ] A processor failed, forming new >> configuration." I learned that it means node2 couldn't see node1 and >> then fence node1. on node1 I get this message : >> >> Oct 23 12:50:45 db1svr rgmanager[75890]: [script] Executing >> /etc/init.d/httpd status >> Oct 23 12:56:01 db1svr kernel: imklog 4.6.2, log source = /proc/kmsg started. >> Oct 23 12:56:01 db1svr rsyslogd: [origin software="rsyslogd" >> swVersion="4.6.2" x-pid="3792" x-info="http://www.rsyslog.com"] >> (re)start >> Oct 23 12:56:01 db1svr kernel: Initializing cgroup subsys cpuset >> Oct 23 12:56:01 db1svr kernel: Initializing cgroup subsys cpu >> Oct 23 12:56:01 db1svr kernel: Linux version 2.6.32-220.el6.x86_64 >> (mockbuild@xxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.4.5 20110214 >> (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Nov 9 08:03:13 EST 2011 >> >> on 12:50 rgmanager still checking the service and then it's rebooted. >> Thing that make it worse is that the date / time of both servers are >> different so that I can't compare the logs directly. Current time >> difference between both servers is around 5 minutes. >> >> I would like to ask where to look for the cause of this failover? I >> plan to graph sar data today to see if there were bottleneck on CPU >> etc so that node1 could not send status to node2, but if no bottleneck >> on CPU or RAM etc where should I find the root cause of failover? >> thank you. >> Regards, >> >> >> >> >> >> -- >> Muhammad Panji >> http://www.panji.web.id >> http://www.kurungsiku.com >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Muhammad Panji http://www.panji.web.id http://www.kurungsiku.com -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster