In order to get the node back into the cluster, I had to reboot all the nodes. Not exactly what I want to have happen. Still not sure why the rgmanager was hung. Instead of calling fence_ipmilan, decided to see what would happen if I pulled the ethernet cable to a node running a service. From /var/log/messages on one node I see the following: ov 5 12:46:17 isc0 openais[2870]: [TOTEM] Creating commit token because I am the rep. Nov 5 12:46:17 isc0 openais[2870]: [TOTEM] Saving state aru 97 high seq received 97 Nov 5 12:46:17 isc0 openais[2870]: [TOTEM] entering COMMIT state. Nov 5 12:46:17 isc0 openais[2870]: [TOTEM] entering GATHER state from 12. Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] entering GATHER state from 11. Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] Creating commit token because I am the rep. Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] entering COMMIT state. Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] entering RECOVERY state. Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] position [0] member 172.16.127.122: Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] previous ring seq 52 rep 172.16.127.122 Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] aru 97 high delivered 97 received flag 0 Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] position [1] member 172.16.127.124: Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] previous ring seq 52 rep 172.16.127.122 Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] aru 97 high delivered 97 received flag 0 Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] Did not need to originate any messages in recovery. Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] Storing new sequence id for ring 3c Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] Sending initial ORF token Nov 5 12:46:22 isc0 openais[2870]: [CLM ] CLM CONFIGURATION CHANGE Nov 5 12:46:22 isc0 openais[2870]: [CLM ] New Configuration: Nov 5 12:46:22 isc0 openais[2870]: [CLM ] r(0) ip(172.16.127.122) Nov 5 12:46:22 isc0 openais[2870]: [CLM ] r(0) ip(172.16.127.124) Nov 5 12:46:22 isc0 openais[2870]: [CLM ] Members Left: Nov 5 12:46:22 isc0 kernel: dlm: closing connection to node 2 Nov 5 12:46:22 isc0 openais[2870]: [CLM ] r(0) ip(172.16.127.123) Nov 5 12:46:22 isc0 openais[2870]: [CLM ] Members Joined: Nov 5 12:46:22 isc0 openais[2870]: [SYNC ] This node is within the primary component and will provi de service. Nov 5 12:46:22 isc0 openais[2870]: [CLM ] CLM CONFIGURATION CHANGE Nov 5 12:46:22 isc0 openais[2870]: [CLM ] New Configuration: Nov 5 12:46:22 isc0 openais[2870]: [CLM ] r(0) ip(172.16.127.122) Nov 5 12:46:22 isc0 openais[2870]: [CLM ] r(0) ip(172.16.127.124) Nov 5 12:46:22 isc0 openais[2870]: [CLM ] Members Left: Nov 5 12:46:22 isc0 openais[2870]: [CLM ] Members Joined: Nov 5 12:46:22 isc0 openais[2870]: [SYNC ] This node is within the primary component and will provi de service. Nov 5 12:46:22 isc0 openais[2870]: [TOTEM] entering OPERATIONAL state. Nov 5 12:46:22 isc0 openais[2870]: [CLM ] got nodejoin message 172.16.127.122 Nov 5 12:46:22 isc0 openais[2870]: [CLM ] got nodejoin message 172.16.127.124 Nov 5 12:46:22 isc0 openais[2870]: [CPG ] got joinlist message from node 1 Nov 5 12:46:22 isc0 openais[2870]: [CPG ] got joinlist message from node 3 Nov 5 12:46:42 isc0 fenced[2889]: isc1 not a cluster member after 20 sec post_fail_delay Nov 5 12:46:42 isc0 fenced[2889]: fencing node "isc1" Nov 5 12:46:42 isc0 fenced[2889]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:172.16.15 8.160...Failed Nov 5 12:46:42 isc0 fenced[2889]: fence "isc1" failed Nov 5 12:46:47 isc0 fenced[2889]: fencing node "isc1" Nov 5 12:46:48 isc0 fenced[2889]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:172.16.15 8.160...Failed The last 3 lines continue to repeat. Any clues as to what might be wrong? Here's an updated cluster.conf <?xml version="1.0"?> <cluster alias="ices_nfscluster" config_version="100" name="nfs_cluster"> <fence_daemon post_fail_delay="20" post_join_delay="3"/> <clusternodes> <clusternode name="isc0" nodeid="1" votes="1"> <fence> <method name="1"> <device lanplus="1" name="iisc0"/> </method> </fence> </clusternode> <clusternode name="isc1" nodeid="2" votes="1"> <fence> <method name="1"> <device lanplus="1" name="iisc1"/> </method> </fence> </clusternode> <clusternode name="isc2" nodeid="3" votes="1"> <fence> <method name="1"> <device lanplus="1" name="iisc2"/> </method> </fence> </clusternode> </clusternodes> <cman/> <fencedevices> <fencedevice agent="fence_ipmilan" auth="none" ipaddr="172.16.158.159" login="root" name="iisc0" passwd="changeme"/> <fencedevice agent="fence_ipmilan" auth="none" ipaddr="172.16.158.160" login="root" name="iisc1" passwd="changeme"/> <fencedevice agent="fence_ipmilan" auth="none" ipaddr="171.16.158.161" login="root" name="iisc2" passwd="changeme"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="fotest" ordered="1" restricted="1"> <failoverdomainnode name="isc0" priority="1"/> <failoverdomainnode name="isc1" priority="1"/> <failoverdomainnode name="isc2" priority="2"/> </failoverdomain> </failoverdomains> <resources> <ip address="172.16.127.15" monitor_link="1"/> <ip address="172.16.127.17" monitor_link="1"/> </resources> <service autostart="1" domain="fotest" name="nfstest" recovery="restart"> <fs device="/dev/ices-fs/test" force_fsck="0" force_unmount="1" fsid="13584" fstype="ext3" mountpoint="/export/test" name="testfs" options="" self_fence="0"/> <nfsexport name="test_export"> <nfsclient name="test_export" options="async,rw,fsid=20" path="/export/test" target="128.83.68.0/24"/> </nfsexport> <ip ref="172.16.127.15"/> </service> <service autostart="1" domain="fotest" name="nfsices" recovery="relocate"> <fs device="/dev/ices-fs/ices" force_fsck="0" force_unmount="1" fsid="44096" fstype="ext3" mountpoint="/export/cices" name="nfsfs" options="" self_fence="0"/> <nfsexport name="nfsexport"> <nfsclient name="nfsclient" options="async,fsid=25,rw" path="/export/cices" target="128.83.68.0/24"/> </nfsexport> <ip ref="172.16.127.17"/> </service> </rm> </cluster> Thanks, Stew -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster