Hi, I had 9 node running kernel 2.6.17.11 with a snapshot of the cman STABLE tree (with in-kernel cman). No dlmm, fenced or gfs. We have have own app and do the fencing ourselves. After 3 nodes died (for unrelated reasons), all of the cman nodes disconnected, even though the cman using service was still running. On every node, in the dmesg, I got messages like the following: CMAN: node ia-009 has been removed from the cluster : Missed too many heartbeats CMAN: node ia-008 has been removed from the cluster : Missed too many heartbeats CMAN: bad generation number 17 in HELLO message from 4, expected 16 CMAN: removing node ia-007 from the cluster : No response to messages CMAN: node ia-006 has been removed from the cluster : No response to messages CMAN: removing node ia-002 from the cluster : No response to messages CMAN: removing node ia-004 from the cluster : No response to messages CMAN: removing node ia-005 from the cluster : No response to messages CMAN: removing node ia-003 from the cluster : No response to messages CMAN: quorum lost, blocking activity CMAN: node ia-001 has been removed from the cluster : No response to messages CMAN: killed by NODEDOWN message CMAN: we are leaving the cluster. No response to messages SM: 03000003 sm_stop: SG still joined Nodes ia-00[789] are the nodes that crashed.. and that message is on the 6 others. -- Olivier Crête ocrete@xxxxxxxxx Maximum Throughput Inc. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster