Re: cluster fenced error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi thanks for your reply

This is Cisco UCS machine. yesterday cisco guys created a separate vswitch for this heartbeat.

regards,
Ben

On Tue, Sep 18, 2012 at 6:25 AM, Digimer <lists@xxxxxxxxxx> wrote:
You have two problems;

1. The nodes can't talk to each other (via multicast) *or* you are taking too long to start each node. Given that you are using luci, I am guessing the former. Log into your switch and see if the multicast group shown in 'cman_tool status' exists.

2. Your fencing isn't working. Read the man page for fence_cisco_ucs to try and debug it.

digimer

PS - Please don't reply directly to me. Keep the conversation public.
PPS - Filter out your passwords. ;)


On 09/17/2012 11:17 PM, Ben .T.George wrote:
Hi thanks for your reply

Beloe is my cluster.conffile

<?xml version="1.0"?>
<cluster config_version="7" name="eccprd">
         <clusternodes>
                 <clusternode name="cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net>" nodeid="1">

                         <fence>
                                 <method name="ucs-node1"/>
                         </fence>
                 </clusternode>
                 <clusternode name="cgceccprd2.combinedgroup.net
<http://cgceccprd2.combinedgroup.net>" nodeid="2">

                         <fence>
                                 <method name="ucs-node2"/>
                         </fence>
                 </clusternode>
         </clusternodes>
         <cman expected_votes="1" two_node="1"/>
         <rm>
                 <resources>
                         <ip address="172.22.10.230" sleeptime="10"/>
                 </resources>
                 <service exclusive="1" name="eccsapmnt"
recovery="relocate">
                         <ip ref="172.22.10.230"/>
                 </service>
         </rm>
         <fencedevices>
                 <fencedevice agent="fence_cisco_ucs"
ipaddr="172.22.90.61" login="admin" name="ucs-node1" passwd="..."/>
                 <fencedevice agent="fence_cisco_ucs"
ipaddr="172.22.90.59" login="admin" name="ucs-node2" passwd="..."/>

         </fencedevices>
</cluster>

when i try to start cluster on node1, i am geeting this message on mesages:

  tail -f -n 0 /var/log/messages
Sep 18 06:06:02 cgceccprd1 modcluster: Starting service: eccsapmnt on node
Sep 18 06:06:08 cgceccprd1 modcluster: Starting service: eccsapmnt on
node cgceccprd1.combinedgroup.net <http://cgceccprd1.combinedgroup.net>



but the service is not starting.on luci , it's showing both nodes are
online.but on clustat different

main error getting on messages is

Sep 18 03:35:48 cgceccprd1 fenced[8424]: fencing node
cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still

retrying
Sep 18 04:06:16 cgceccprd1 fenced[8424]: fencing node
cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still

retrying
Sep 18 04:36:45 cgceccprd1 fenced[8424]: fencing node
cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still

retrying
Sep 18 05:07:14 cgceccprd1 fenced[8424]: fencing node
cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still

retrying
Sep 18 05:37:42 cgceccprd1 fenced[8424]: fencing node
cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still

retrying

These messages from node1.i am geeting same message on node saying that

cgceccprd2 fenced[8424]: fencing node cgceccprd1.combinedgroup.net
<http://cgceccprd1.combinedgroup.net> still retrying


i don't know what is problem here.

please help me solve
Regards,
Ben

On Tue, Sep 18, 2012 at 4:42 AM, Digimer <lists@xxxxxxxxxx
<mailto:lists@xxxxxxxxxx>> wrote:

    On 09/17/2012 06:07 PM, Ben .T.George wrote:

        Hi

        My cluster is failing to start.

        if i check clustat on node1, status is showing node1 online and
        node2
        offline. If the check clustat on node2, node2 is showing online and
        node1 is offline

        i checked logs.fanced is throwing errors.how can i rectify this

        Sep 17 23:24:54 fenced fencing node cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>> still retrying

        Sep 17 23:55:06 fenced fencing node cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>> still retrying

        Sep 18 00:25:19 fenced fencing node cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>> still retrying

        Sep 18 00:55:03 fenced fenced 3.0.12.1 started
        Sep 18 00:55:03 fenced failed to get dbus connection
        Sep 18 00:55:55 fenced fencing node cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>>

        Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
        result: error

        no method
        Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>> failed

        Sep 18 00:55:58 fenced fencing node cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>>

        Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
        result: error

        no method
        Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>> failed

        Sep 18 00:56:01 fenced fencing node cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>>

        Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
        result: error

        no method
        Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
        <http://cgceccprd1.combinedgroup.net>
        <http://cgceccprd1.__combinedgroup.net

        <http://cgceccprd1.combinedgroup.net>> failed



        please help me solve this issue

        Regards,
        Ben


    What is your cluster.conf?

    likely you either have no fencing configured, or your fencing is not
    working. Either way, failing to fence is a critical problem and the
    cluster will hang, just as you're seeing here. This is by design.
    Better to hang a cluster than to corrupt it.

    digimer

    --
    Digimer
    Papers and Projects: https://alteeve.ca





--
Digimer
Papers and Projects: https://alteeve.ca



--
Yours Sincerely

#!/usr/bin/env python
#Mysignature.py :)


Signature = " " " Ben.T.George \n
                  Linux System Administrator \n
                  Diyar United Company \n
                  kuwait \n
                  Phone : +965 - 50629829 \n " "
"

Print Signature

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux