All, I'll provide some more config details a little later, but thought maybe some cursory information could yield a response. Simple four node cluster running RHEL4U7, latest RHEL cluster packages. Three GFS filesystems. This morning one of our nodes remained responsive, but was having some problems that required a reboot. Unfortunately, most commands from the command line were unsuccessful (Input/Output error, seems the root filesystem may have been remounted read only). I decided to fence the node from another node in the cluster -- using fence_node <nodename>. This calls fence_drac. The operation returned successful, the node was fenced and rebooted. After this fencing operation, all nodes reporting their Membership state (as reported by cman_tool status) as Transition-Master. Per http://sources.redhat.com/cluster/faq.html#gfs_fencefreeze, I understand that GFS will freeze briefly after fencing is performed. The filesystems did not return to a responsive state. After many transition restarts, all nodes leave the cluster (as expected). Some logs and cluster.conf below. Shawn Oct 16 10:09:12 hugin fence_node[3512]: Fence of "munin" was successful Oct 16 10:09:32 hugin kernel: CMAN: removing node munin from the cluster : Missed too many heartbeats Oct 16 10:09:32 hugin kernel: CMAN: Initiating transition, generation 69 Oct 16 10:09:47 hugin kernel: CMAN: Initiating transition, generation 70 Oct 16 10:10:02 hugin kernel: CMAN: Initiating transition, generation 71 Oct 16 10:10:17 hugin kernel: CMAN: Initiating transition, generation 72 Oct 16 10:10:32 hugin kernel: CMAN: Initiating transition, generation 73 Oct 16 10:10:47 hugin kernel: CMAN: Initiating transition, generation 74 Oct 16 10:11:02 hugin kernel: CMAN: Initiating transition, generation 75 Oct 16 10:11:17 hugin kernel: CMAN: Initiating transition, generation 76 Oct 16 10:11:32 hugin kernel: CMAN: Initiating transition, generation 77 Oct 16 10:11:47 hugin kernel: CMAN: Initiating transition, generation 78 Oct 16 10:12:02 hugin kernel: CMAN: Initiating transition, generation 79 Oct 16 10:12:14 hugin kernel: CMAN: removing node odin from the cluster : Inconsistent cluster view Oct 16 10:12:14 hugin kernel: CMAN: Initiating transition, generation 80 Oct 16 10:12:14 hugin kernel: CMAN: removing node odin from the cluster : Inconsistent cluster view Oct 16 10:12:14 hugin kernel: CMAN: Initiating transition, generation 81 Oct 16 10:12:16 hugin kernel: CMAN: removing node zeus from the cluster : Inconsistent cluster view Oct 16 10:12:16 hugin kernel: CMAN: quorum lost, blocking activity Oct 16 10:12:16 hugin clurgmgrd[8799]: <emerg> #1: Quorum Dissolved Oct 16 10:12:16 hugin kernel: CMAN: removing node zeus from the cluster : Inconsistent cluster view Oct 16 10:12:19 hugin ccsd[6330]: Cluster is not quorate. Refusing connection. Oct 16 10:12:19 hugin ccsd[6330]: Error while processing connect: Connection refused Oct 16 10:12:29 hugin ccsd[6330]: Cluster is not quorate. Refusing connection. Oct 16 10:12:29 hugin ccsd[6330]: Error while processing connect: Connection refused Oct 16 10:12:39 hugin ccsd[6330]: Cluster is not quorate. Refusing connection. Oct 16 10:13:47 hugin kernel: CMAN: node munin rejoining Oct 16 10:13:47 hugin kernel: CMAN: Completed transition, generation 81 Oct 16 10:13:49 hugin ccsd[6330]: Cluster is not quorate. Refusing connection. Oct 16 10:13:49 hugin ccsd[6330]: Error while processing connect: Connection refused -- previous error message repeated several times --- Another node in the same cluster, after fencing munin from hugin: Oct 16 10:09:31 zeus kernel: CMAN: removing node munin from the cluster : Missed too many heartbeats Oct 16 10:09:31 zeus kernel: CMAN: Initiating transition, generation 69 Oct 16 10:09:46 zeus kernel: CMAN: Initiating transition, generation 70 Oct 16 10:10:01 zeus kernel: CMAN: Initiating transition, generation 71 cluster.conf: <?xml version="1.0"?> <cluster alias="tungsten" config_version="31" name="qualia"> <fence_daemon post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="odin" votes="1"> <fence> <method name="1"> <device modulename="" name="odin-drac"/> </method> </fence> </clusternode> <clusternode name="hugin" votes="1"> <fence> <method name="1"> <device modulename="" name="hugin-drac"/> </method> </fence> </clusternode> <clusternode name="munin" votes="1"> <fence> <method name="1"> <device modulename="" name="munin-drac"/> </method> </fence> </clusternode> <clusternode name="zeus" votes="1"> <fence> <method name="1"> <device modulename="" name="zeus-drac"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="0"/> <fencedevices> <resources/> <fencedevice name="odin-drac" agent="fence_drac" <redacted>/> <fencedevice name="hugin-drac" agent="fence_drac" <redacted>/> <fencedevice name="munin-drac" agent="fence_drac" <redacted>/> <fencedevice name="zeus-drac" agent="fence_drac" <redacted>/> </fencedevices> <rm> <failoverdomains/> <resources/> </rm> </cluster> -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster