Re: Cman doesn't realize the failed node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I solved my problem. When the kernel IP forwarding feature (/proc/sys/net/ipv4/ip_forward) is 0, then cluster nodes don't realize the failure. I write this solution to help others.

However, I am curious about that is all of your RedHat 5 OS default ip_forward settting is enabled? Are all your failover clusters working as expected ?

Have a nice day list.

PS: This change is included just in Red Hat 4 Cluster Suite documentation not in Red Hat 5 cluster suite. Interesting!!!

----- veliogluh@xxxxxxxxxx den ileti ---------
  Tarih: Wed, 12 Nov 2008 13:17:00 +0200
 Kimden:  Hakan VELIOGLU <veliogluh@xxxxxxxxxx>
Yanıt Adresi:linux clustering <linux-cluster@xxxxxxxxxx>
   Konu:  Cman doesn't realize the failed node
   Kime: linux clustering <linux-cluster@xxxxxxxxxx>


Hi,

I am testing and trying to understand the cluster environment. I ve built a two node cluster system without any service (Red Hat EL 5.2 x64). I run the cman and rgmanager services succesfully and then poweroff one node suddenly. After thsi I excpect that the other node realize this failure and take up all the resources however running node doesn't realize this failure. I use "cman_tool nodes" and "clustat" commands and they say the failed node is active and online. What am i missing? Why cman doesn't realize the failure?

[root@cl1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0" ?>
<cluster alias="kume" config_version="54" name="kume">
        <totem token="1000" hold="100"/>
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="cl2.cc.itu.edu.tr" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="cl1.cc.itu.edu.tr" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <failoverdomains>
<failoverdomain name="domain" ordered="1" restricted="1"> <failoverdomainnode name="cl2.cc.itu.edu.tr" priority="1"/> <failoverdomainnode name="cl1.cc.itu.edu.tr" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
<service autostart="0" domain="domain" name="veritabani" recovery="restart"/>
        </rm>
</cluster>
[root@cl1 ~]#


When the node gows down, the TOTEM repeastedly logs messages like this.
Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.



Hakan


--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster



----- veliogluh@xxxxxxxxxx den iletiyi bitir -----



--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux