Re: CentOS 6.5 RHCS fence loops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



In 2-node clusters, never allow cman or rgmanager to start on boot. A node will reboot for two reasons; it was fenced or it is scheduled maintenance. In the former case, you want to review it before restoring it. In the later case, a human is there to start it already. This is good advice for 3+ clusters as well.

As an aside, the default timeout to wait for the peer on start is 6 seconds, which I find to be too short. I up it to 30 seconds with:

<fence_daemon post_join_delay="30" />

As for the fence-on-start, it could be a network issue. Have you tried unicast instead of multicast? Try this:

<cman transport="udpu" expected_votes="1" two_node="1" />

Slight comment;

> When cluster being quorum,

Nodes are always quorate in 2-node clusters.

digimer

On 29/10/14 04:44 AM, aditya hilman wrote:
Hi Guys,

I'm using centos 6.5 as guest on RHEV and rhcs for cluster web environment.
The environtment :
web1.example.com
web2.example.com

When cluster being quorum, the web1 reboots by web2. When web2 is going up,
web2 reboots by web1.
Does anybody know how to solving this "fence loop" ?
master_wins="1" is not working properly, qdisk also.
Below the cluster.conf, I re-create "fresh" cluster, but the fence loop is
still exist.

<?xml version="1.0"?>
<cluster config_version="7" name="web-cluster">
         <clusternodes>
                 <clusternode name="web2.cluster" nodeid="1">
                         <fence>
                                 <method name="fence-web2">
                                         <device name="fence-rhevm"
port="web2.cluster"/>
                                 </method>
                         </fence>
                 </clusternode>
                 <clusternode name="web3.cluster" nodeid="2">
                         <fence>
                                 <method name="fence-web3">
                                         <device name="fence-rhevm"
port="web3.cluster"/>
                                 </method>
                         </fence>
                 </clusternode>
         </clusternodes>
         <cman expected_votes="1" two_node="1"/>
         <fencedevices>
                 <fencedevice agent="fence_rhevm" ipaddr="192.168.1.1"
login="admin@internal" name="fence-rhevm" passwd="secret" ssl="on"/>
         </fencedevices>
</cluster>


Log : /var/log/messages
Oct 29 07:34:04 web2 corosync[1182]:   [QUORUM] Members[1]: 1
Oct 29 07:34:04 web2 corosync[1182]:   [QUORUM] Members[1]: 1
Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster dev 0.0 agent
fence_rhevm result: error from agent
Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster dev 0.0 agent
fence_rhevm result: error from agent
Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster failed
Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster failed
Oct 29 07:34:12 web2 fenced[1242]: fence web3.cluster success
Oct 29 07:34:12 web2 fenced[1242]: fence web3.cluster success
Oct 29 07:34:12 web2 clvmd: Cluster LVM daemon started - connected to CMAN
Oct 29 07:34:12 web2 clvmd: Cluster LVM daemon started - connected to CMAN
Oct 29 07:34:12 web2 rgmanager[1790]: I am node #1
Oct 29 07:34:12 web2 rgmanager[1790]: I am node #1
Oct 29 07:34:12 web2 rgmanager[1790]: Resource Group Manager Starting
Oct 29 07:34:12 web2 rgmanager[1790]: Resource Group Manager Starting


Thanks



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos




[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux