Just gave the whole configuration a new try and setuped the whole cluster once again. This is the resulting cluster.conf with a very basic configuration. [root@ipsdb01 ~]# cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster alias="ips_database" config_version="7" name="ips_database"> <fence_daemon clean_start="1" post_fail_delay="10" post_join_delay="30"/> <clusternodes> <clusternode name="10.102.10.51" nodeid="1" votes="1"> <fence> <method name="1"> <device name="ipsdb01.drac"/> </method> </fence> </clusternode> <clusternode name="10.102.10.28" nodeid="2" votes="1"> <fence> <method name="1"> <device name="ips08.drac"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_drac" ipaddr="10.102.10.128" login="root" name="ips08.drac" passwd="xxx"/> <fencedevice agent="fence_drac" ipaddr="10.102.10.151" login="root" name="ipsdb01.drac" passwd="xxx"/> </fencedevices> <rm> <failoverdomains/> <resources> <ip address="10.209.170.55" monitor_link="1"/> </resources> <service autostart="1" exclusive="0" name="ips_database" recovery="relocate"> <ip ref="10.209.170.55"/> </service> </rm> </cluster> Services running on 10.102.10.28. I've did a 'powerdown' with the drac-interface but the service is not taken over by the second node. clustat on the remaining node gave an interessting output [root@ipsdb01 ~]# clustat Cluster Status for ips_database @ Thu May 28 09:31:30 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ 10.102.10.51 1 Online, Local, rgmanager 10.102.10.28 2 Offline Service Name Owner (Last) State ------- ---- ----- ------ ----- service:ips_database 10.102.10.28 started The service is 'started' but the Owner (10.102.10.28) is offline. These are the last lines from /var/log/messages May 28 09:27:03 ipsdb01 kernel: dlm: closing connection to node 2 May 28 09:27:03 ipsdb01 openais[5295]: [CLM ] Members Joined: May 28 09:27:03 ipsdb01 fenced[5315]: 10.102.10.28 not a cluster member after 0 sec post_fail_delay May 28 09:27:03 ipsdb01 openais[5295]: [SYNC ] This node is within the primary component and will provide service. May 28 09:27:03 ipsdb01 openais[5295]: [TOTEM] entering OPERATIONAL state. May 28 09:27:03 ipsdb01 openais[5295]: [CLM ] got nodejoin message 10.102.10.51 May 28 09:27:03 ipsdb01 openais[5295]: [CPG ] got joinlist message from node 1 The remaining system recognizes the failure, but don't start any takeover-action. Anyone an Idea what can cause such a Problem ? Marco Nietz schrieb: > Tiago Cruz schrieb: >> Did you have: >> >> <cman two_node="1" expected_votes="1"/> >> >> ? >> > > yes, have this in my config. > > > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster