Re: 2-node cluster fence loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Have you tried simple things like disabling iptables or selinux, just to test? If that doesn't work, and it's a small cluster, try unicast and see if that helps (again, even if just to test).

On 12/06/14 10:29 AM, Arun G Nair wrote:
We have multicast enabled on the switch. I've also tried the
multicast.py tool from RH's knowledge base to test multicast and I see
the expected output, though the tool uses a different multicast IP(
guess that shouldn't matter). I've tried increasing the post_join_delay
to 360 seconds to give me enough time to check everything on both the
nodes. One node still gets fenced. `clustat` output says the other node
is offline on both servers. So one node can't see the other one ? This
again points to issue with multicast. Any other clues as to what/where
to look ?


On Wed, Jun 11, 2014 at 8:33 PM, Digimer <lists@xxxxxxxxxx
<mailto:lists@xxxxxxxxxx>> wrote:

    On 11/06/14 10:48 AM, Arun G Nair wrote:

        Hello,

             What are the reasons for fence loops when only cman is
        started ? We
        have an RHEL 6.5 2-node cluster which goes in to a fence loop
        and every
        time we start cman on both nodes. Either one fences the other.
        Multicast
        seems to be working properly. My understanding is that without
        rgmanager
        running there won't be a multicast group subscription ? I don't
        see the
        multicast address in 'netstat -g' unless rgmanager is running. I've
        tried to increase the fence post_join_delay but one of the nodes
        still
        gets fenced.

        The cluster works fine if we use unicast UDP.

        Thanks,


    Hi,

       When cman starts, it waits post_join_delay seconds for the peer
    to connect. If, after that time expires (6 seconds by default,
    iirc), it gives up and calls a fence against the peer to put it into
    a known state.

       Corosync is what determines membership, and it is started by
    cman. The rgmanager only handles resource
    start/stop/relocate/recovery and has nothing to do with fencing
    directly. Corosync is what uses multicast.

       So as you seem to have already surmised, multicast is probably
    not working in your environment. Have you enabled multicast traffic
    on the firewall? Do your switches support multicast properly?

    digimer

    --
    Digimer
    Papers and Projects: https://alteeve.ca/w/
    What if the cure for cancer is trapped in the mind of a person
    without access to education?

    --
    Linux-cluster mailing list
    Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx>
    https://www.redhat.com/__mailman/listinfo/linux-cluster
    <https://www.redhat.com/mailman/listinfo/linux-cluster>




--
Arun G Nair
Sr. Sysadmin
Dimension Data | Ph: (800) 664-9973
Feedback? We're listening <http://www.surveymonkey.com/s/XRCYXBH>




--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux