RE: Strange Behavior

"Robert Gil" <Robert.Gil@xxxxxxxxxxxxxx> · Tue, 22 May 2007 13:47:59 -0400

Nevermind. This was all due to incorrect time on a couple 
of the nodes. One node was in the past, and one was in the 
future.

It may be beneficial to fix this as it DOES cause a kernel 
panic. Maybe add some kind of time sync check to disallow a node from joining 
when its time isn't within X of the cluster.

Robert 
Gil
Linux Systems 
Administrator
American Home Mortgage
Phone: 
631-622-8410
Cell: 631-827-5775
Fax: 
516-495-5861

From: linux-cluster-bounces@xxxxxxxxxx 
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Robert 
Gil
Sent: Tuesday, May 22, 2007 11:49 AM
To: 
linux-cluster@xxxxxxxxxx
Subject:  Strange 
Behavior

I am getting some 
strange behavior on a 4 node cluster. When node dbs2 tries to connect to 
the cluster, node app3 either kernel panics or ccsd and rgmanager crash. 
Node dbs2 says that the heartbeats drop off and it goes to remove itself 
from the cluster. I am curious why node app3 would crash, and what these SM 
messages are. Also why node dbs2 would connect to the cluster, become 
quorate, and then drop off and crash node 1. Has anyone seen this 
before?

/var/log/messages

May 22 11:34:36 melqsjssapp03 kernel: CMAN: node 
melqsjssdbs02.americanhm.com rejoining
May 22 11:35:11 melqsjssapp03 kernel: 
CMAN: node melqsjssdbs02.americanhm.com has been removed from the cluster : 
Missed too many heartbeats
May 22 11:35:25 melqsjssapp03 kernel: CMAN: node 
melqsjssapp03.americanhm.com has been removed from the cluster : No response to 
messages
May 22 11:35:25 melqsjssapp03 kernel: CMAN: killed by NODEDOWN 
message
May 22 11:35:25 melqsjssapp03 kernel: CMAN: we are leaving the 
cluster. No response to messages
May 22 11:35:25 melqsjssapp03 kernel: 
WARNING: dlm_emergency_shutdown
May 22 11:35:25 melqsjssapp03 kernel: 
WARNING: dlm_emergency_shutdown
May 22 11:35:25 melqsjssapp03 kernel: SM: 
00000011 sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 
01000014 sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 
0200001a sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 
03000002 sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 
clurgmgrd[5179]: <warning> #67: Shutting down uncleanly 
May 22 
11:35:25 melqsjssapp03 ccsd[4630]: Cluster manager shutdown.  Attemping to 
reconnect... 
May 22 11:35:51 melqsjssapp03 ccsd[4630]: Unable to connect to 
cluster infrastructure after 30 seconds. 
May 22 11:36:21 melqsjssapp03 
ccsd[4630]: Unable to connect to cluster infrastructure after 60 
seconds.

Thanks,

Robert 
Gil
Linux Systems 
Administrator
American Home Mortgage
Phone: 
631-622-8410
Cell: 631-827-5775
Fax: 
516-495-5861

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster