My cluster went down pretty hard, in that I had to hard reboot several machines, and now the fence daemon won't come up. I run: $ ccsd && cman_tool join -w $ fence_tool join -w -j 15 -D blade02:~ # fence_tool join -w -D -j 15 fence_tool: wait for quorum 1 fence_tool: get our node name fence_tool: connect to ccs fence_tool: start fenced fenced: 1122003465 our name from cman "blade02" fenced: 1122003465 delay post_join 15s post_fail 0s fenced: 1122003465 added 14 nodes from ccs and it hangs there forever until I hit ^C. On one of the surviving machines, I see (dmesg): SM: 00000001 process_recovery_barrier status=-104 CMAN: node blade03 has been removed from the cluster : Missed too many heartbeats SM: 00000001 process_recovery_barrier status=-104 CMAN: node blade06 has been removed from the cluster : Missed too many heartbeats SM: 00000001 process_recovery_barrier status=-104 CMAN: node blade09 has been removed from the cluster : No response to messages CMAN: bad generation number 371 in HELLO message from 1, expected 370 CMAN: removing node blade08 from the cluster : No response to messages CMAN: removing node blade07 from the cluster : No response to messages CMAN: quorum lost, blocking activity SM: 00000001 process_recovery_barrier status=-104 Is there a way to recover (restart gfs) without having to reboot this last machine? thanks, dan p.s. here's some more info: blade13:~ # cman_tool nodes Node Votes Exp Sts Name 1 1 1 M blade01 2 1 1 X blade02 3 1 1 X blade03 4 1 1 X blade04 6 1 1 X blade06 7 1 1 X blade07 8 1 1 X blade08 9 1 1 X blade09 10 1 1 X blade10 11 1 1 X blade11 12 1 1 X blade12 13 1 1 M blade13 14 1 1 X blade14 blade13:~ # cman_tool status Protocol version: 5.0.1 Config version: 1 Cluster name: blade_cluster Cluster ID: 38068 Cluster Member: Yes Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 2 Active subsystems: 6 Node name: blade13 blade13:~ # cman_tool services Service Name GID LID State Code Fence Domain: "default" 1 2 recover 2 - [13] DLM Lock Space: "clvmd" 2 3 recover 0 - [13] DLM Lock Space: "lil_cheesy1_lv" 11 4 run - [13] GFS Mount Group: "lil_cheesy1_lv" 12 5 run - [13] -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster