Strange crash with cman (stable branch)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I had 9 node running kernel 2.6.17.11 with a snapshot of the cman STABLE
tree (with in-kernel cman). No dlmm, fenced or gfs. We have have own app
and do the fencing ourselves. After 3 nodes died (for unrelated
reasons), all of the cman nodes disconnected, even though the cman using
service was still running. On every node, in the dmesg, I got messages
like the following:

CMAN: node ia-009 has been removed from the cluster : Missed too many heartbeats
CMAN: node ia-008 has been removed from the cluster : Missed too many heartbeats
CMAN: bad generation number 17 in HELLO message from 4, expected 16
CMAN: removing node ia-007 from the cluster : No response to messages
CMAN: node ia-006 has been removed from the cluster : No response to messages
CMAN: removing node ia-002 from the cluster : No response to messages
CMAN: removing node ia-004 from the cluster : No response to messages
CMAN: removing node ia-005 from the cluster : No response to messages
CMAN: removing node ia-003 from the cluster : No response to messages
CMAN: quorum lost, blocking activity
CMAN: node ia-001 has been removed from the cluster : No response to messages
CMAN: killed by NODEDOWN message
CMAN: we are leaving the cluster. No response to messages
SM: 03000003 sm_stop: SG still joined


Nodes ia-00[789] are the nodes that crashed.. and that message is on the
6 others.

-- 
Olivier Crête
ocrete@xxxxxxxxx
Maximum Throughput Inc.


--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux