Regardless what was the root cause you find. Cluster requires Ntp service to ensure all nodes have time synchronized. So you have to fix this 5 mins difference now. Regards Yu On 09/11/2012, at 11:47, Muhammad Panji <sumodirjo@xxxxxxxxx> wrote: > Dear All, > I have an oracle cluster on RHEL 6.2 with 2 servers. Several days ago > the service was failover from node1 to node2. From /var/log/messages > on node2 I only see this message : > > ... > Oct 23 12:54:19 db2svr corosync[4142]: [TOTEM ] A processor failed, > forming new configuration. > Oct 23 12:54:21 db2svr corosync[4142]: [QUORUM] Members[1]: 2 > Oct 23 12:54:21 db2svr corosync[4142]: [TOTEM ] A processor joined > or left the membership and a new membership was formed. > Oct 23 12:54:21 db2svr kernel: dlm: closing connection to node 1 > Oct 23 12:54:21 db2svr rgmanager[5327]: State change: clu1 DOWN > Oct 23 12:54:21 db2svr fenced[4193]: fencing node clu1 > ... > > Googling this message " [TOTEM ] A processor failed, forming new > configuration." I learned that it means node2 couldn't see node1 and > then fence node1. on node1 I get this message : > > Oct 23 12:50:45 db1svr rgmanager[75890]: [script] Executing > /etc/init.d/httpd status > Oct 23 12:56:01 db1svr kernel: imklog 4.6.2, log source = /proc/kmsg started. > Oct 23 12:56:01 db1svr rsyslogd: [origin software="rsyslogd" > swVersion="4.6.2" x-pid="3792" x-info="http://www.rsyslog.com"] > (re)start > Oct 23 12:56:01 db1svr kernel: Initializing cgroup subsys cpuset > Oct 23 12:56:01 db1svr kernel: Initializing cgroup subsys cpu > Oct 23 12:56:01 db1svr kernel: Linux version 2.6.32-220.el6.x86_64 > (mockbuild@xxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.4.5 20110214 > (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Nov 9 08:03:13 EST 2011 > > on 12:50 rgmanager still checking the service and then it's rebooted. > Thing that make it worse is that the date / time of both servers are > different so that I can't compare the logs directly. Current time > difference between both servers is around 5 minutes. > > I would like to ask where to look for the cause of this failover? I > plan to graph sar data today to see if there were bottleneck on CPU > etc so that node1 could not send status to node2, but if no bottleneck > on CPU or RAM etc where should I find the root cause of failover? > thank you. > Regards, > > > > > > -- > Muhammad Panji > http://www.panji.web.id > http://www.kurungsiku.com > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster