Strange Behavior

"Robert Gil" <Robert.Gil@xxxxxxxxxxxxxx> · Tue, 22 May 2007 11:48:51 -0400

I am getting some 
strange behavior on a 4 node cluster. When node dbs2 tries to connect to 
the cluster, node app3 either kernel panics or ccsd and rgmanager crash. 
Node dbs2 says that the heartbeats drop off and it goes to remove itself 
from the cluster. I am curious why node app3 would crash, and what these SM 
messages are. Also why node dbs2 would connect to the cluster, become 
quorate, and then drop off and crash node 1. Has anyone seen this 
before?

/var/log/messages

May 22 11:34:36 melqsjssapp03 kernel: CMAN: node 
melqsjssdbs02.americanhm.com rejoining
May 22 11:35:11 melqsjssapp03 kernel: 
CMAN: node melqsjssdbs02.americanhm.com has been removed from the cluster : 
Missed too many heartbeats
May 22 11:35:25 melqsjssapp03 kernel: CMAN: node 
melqsjssapp03.americanhm.com has been removed from the cluster : No response to 
messages
May 22 11:35:25 melqsjssapp03 kernel: CMAN: killed by NODEDOWN 
message
May 22 11:35:25 melqsjssapp03 kernel: CMAN: we are leaving the 
cluster. No response to messages
May 22 11:35:25 melqsjssapp03 kernel: 
WARNING: dlm_emergency_shutdown
May 22 11:35:25 melqsjssapp03 kernel: 
WARNING: dlm_emergency_shutdown
May 22 11:35:25 melqsjssapp03 kernel: SM: 
00000011 sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 
01000014 sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 
0200001a sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 
03000002 sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 
clurgmgrd[5179]: <warning> #67: Shutting down uncleanly 
May 22 
11:35:25 melqsjssapp03 ccsd[4630]: Cluster manager shutdown.  Attemping to 
reconnect... 
May 22 11:35:51 melqsjssapp03 ccsd[4630]: Unable to connect to 
cluster infrastructure after 30 seconds. 
May 22 11:36:21 melqsjssapp03 
ccsd[4630]: Unable to connect to cluster infrastructure after 60 
seconds.

Thanks,

Robert 
Gil
Linux Systems 
Administrator
American Home Mortgage
Phone: 
631-622-8410
Cell: 631-827-5775
Fax: 
516-495-5861

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster