You have two problems;
1. The nodes can't talk to each other (via multicast) *or* you are
taking too long to start each node. Given that you are using luci, I am
guessing the former. Log into your switch and see if the multicast group
shown in 'cman_tool status' exists.
2. Your fencing isn't working. Read the man page for fence_cisco_ucs to
try and debug it.
digimer
PS - Please don't reply directly to me. Keep the conversation public.
PPS - Filter out your passwords. ;)
On 09/17/2012 11:17 PM, Ben .T.George wrote:
Hi thanks for your reply
Beloe is my cluster.conffile
<?xml version="1.0"?>
<cluster config_version="7" name="eccprd">
<clusternodes>
<clusternode name="cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>" nodeid="1">
<fence>
<method name="ucs-node1"/>
</fence>
</clusternode>
<clusternode name="cgceccprd2.combinedgroup.net
<http://cgceccprd2.combinedgroup.net>" nodeid="2">
<fence>
<method name="ucs-node2"/>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<rm>
<resources>
<ip address="172.22.10.230" sleeptime="10"/>
</resources>
<service exclusive="1" name="eccsapmnt"
recovery="relocate">
<ip ref="172.22.10.230"/>
</service>
</rm>
<fencedevices>
<fencedevice agent="fence_cisco_ucs"
ipaddr="172.22.90.61" login="admin" name="ucs-node1" passwd="..."/>
<fencedevice agent="fence_cisco_ucs"
ipaddr="172.22.90.59" login="admin" name="ucs-node2" passwd="..."/>
</fencedevices>
</cluster>
when i try to start cluster on node1, i am geeting this message on mesages:
tail -f -n 0 /var/log/messages
Sep 18 06:06:02 cgceccprd1 modcluster: Starting service: eccsapmnt on node
Sep 18 06:06:08 cgceccprd1 modcluster: Starting service: eccsapmnt on
node cgceccprd1.combinedgroup.net <http://cgceccprd1.combinedgroup.net>
but the service is not starting.on luci , it's showing both nodes are
online.but on clustat different
main error getting on messages is
Sep 18 03:35:48 cgceccprd1 fenced[8424]: fencing node
cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
retrying
Sep 18 04:06:16 cgceccprd1 fenced[8424]: fencing node
cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
retrying
Sep 18 04:36:45 cgceccprd1 fenced[8424]: fencing node
cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
retrying
Sep 18 05:07:14 cgceccprd1 fenced[8424]: fencing node
cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
retrying
Sep 18 05:37:42 cgceccprd1 fenced[8424]: fencing node
cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
retrying
These messages from node1.i am geeting same message on node saying that
cgceccprd2 fenced[8424]: fencing node cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net> still retrying
i don't know what is problem here.
please help me solve
Regards,
Ben
On Tue, Sep 18, 2012 at 4:42 AM, Digimer <lists@xxxxxxxxxx
<mailto:lists@xxxxxxxxxx>> wrote:
On 09/17/2012 06:07 PM, Ben .T.George wrote:
Hi
My cluster is failing to start.
if i check clustat on node1, status is showing node1 online and
node2
offline. If the check clustat on node2, node2 is showing online and
node1 is offline
i checked logs.fanced is throwing errors.how can i rectify this
Sep 17 23:24:54 fenced fencing node cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>> still retrying
Sep 17 23:55:06 fenced fencing node cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>> still retrying
Sep 18 00:25:19 fenced fencing node cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>> still retrying
Sep 18 00:55:03 fenced fenced 3.0.12.1 started
Sep 18 00:55:03 fenced failed to get dbus connection
Sep 18 00:55:55 fenced fencing node cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>>
Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
result: error
no method
Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>> failed
Sep 18 00:55:58 fenced fencing node cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>>
Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
result: error
no method
Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>> failed
Sep 18 00:56:01 fenced fencing node cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>>
Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
result: error
no method
Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>
<http://cgceccprd1.__combinedgroup.net
<http://cgceccprd1.combinedgroup.net>> failed
please help me solve this issue
Regards,
Ben
What is your cluster.conf?
likely you either have no fencing configured, or your fencing is not
working. Either way, failing to fence is a critical problem and the
cluster will hang, just as you're seeing here. This is by design.
Better to hang a cluster than to corrupt it.
digimer
--
Digimer
Papers and Projects: https://alteeve.ca
--
Digimer
Papers and Projects: https://alteeve.ca
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster