Re: Cluster services die when nonactive node is rebooted

umesh susvirkar <susvirkar.3616@xxxxxxxxx> · Sun, 25 Jul 2010 11:47:52 +0530

Try to set following in you cluster.conf file
<cman expected_votes="3" quorum_dev_poll="35000" >

                <multicast addr="224.0.0.1" interface="eth0"/>

        </cman>

---
cal for 
quorum_dev_poll > (interval * tko ) 

as per below 5*6 = 30 so 35
<quorumd interval="5" label="delta_qdisk" min_score="1" tko="6" votes="1">

                <heuristic interval="5" program="ping -t1 -c1 192.168.1.1" score="1"/>

        </quorumd>

for more info read following doc
https://access.redhat.com/kb/docs/DOC-2882
http://people.redhat.com/lhh/cmanvsqdisk.png

On Sat, Jul 24, 2010 at 3:50 AM, Eric Schneider <eschneid@xxxxxxxx> wrote:

I have a few 2 node clusters and I notice that recently the clusters lose quorum when I reboot the node without running services.  I could do this in the past without any problems.  CentOS 5.5 on ESX 4.0 u1.  Maybe a bug with a new kernel or cman software?

I get the following right away when the node reboots:
Jul 23 16:02:32 happy5 clurgmgrd[4269]: <notice> Member 2 shutting down
Jul 23 16:02:52 happy5 qdiskd[3562]: <info> Node 2 shutdown
Jul 23 16:03:02 happy5 qdiskd[3562]: <info> Assuming master role
Jul 23 16:03:03 happy5 clurgmgrd[4269]: <emerg> #1: Quorum Dissolved
Jul 23 16:03:03 happy5 openais[3533]: [CMAN ] lost contact with quorum device
Jul 23 16:03:03 happy5 openais[3533]: [CMAN ] quorum lost, blocking activity
Jul 23 16:03:03 happy5 ccsd[3493]: Cluster is not quorate.  Refusing connection.
Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing connect: Connection refused
Jul 23 16:03:03 happy5 ccsd[3493]: Cluster is not quorate.  Refusing connection.
Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing connect: Connection refused
Jul 23 16:03:03 happy5 ccsd[3493]: Invalid descriptor specified (-111).
Jul 23 16:03:03 happy5 ccsd[3493]: Someone may be attempting something evil.
Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing get: Invalid request descriptor
Jul 23 16:03:03 happy5 ccsd[3493]: Invalid descriptor specified (-111).
Jul 23 16:03:03 happy5 ccsd[3493]: Someone may be attempting something evil.
Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing get: Invalid request descriptor

<?xml version="1.0"?>
<cluster alias="delta_cluster" config_version="40" name="delta_cluster">
        <fence_daemon post_fail_delay="5" post_join_delay="120"/>
        <quorumd interval="5" label="delta_qdisk" min_score="1" tko="6" votes="1">
                <heuristic interval="5" program="ping -t1 -c1 192.168.1.1" score="1"/>
        </quorumd>
        <clusternodes>
                <clusternode name="node1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="node1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="node2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="3">
                <multicast addr="224.0.0.1" interface="eth0"/>
        </cman>
        <fencedevices>
                <fencedevice agent="fence_manual" name="fence_manual"/>
                <fencedevice agent="fence_vmware" ipaddr="bob" login="username" name="node1" passwd="password" port="node1"/>
                <fencedevice agent="fence_vmware" ipaddr="bob" login="username" name="node2" passwd="password" port="node2"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="node1" ordered="0" restricted="1">
                                <failoverdomainnode name="node1" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="node2" restricted="1">
                                <failoverdomainnode name="node2" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="failover_pro-http" restricted="0">
                                <failoverdomainnode name="node1" priority="1"/>
                                <failoverdomainnode name="node2" priority="1"/>
                        </failoverdomain>
                </failoverdomains>

        </rm>
        <totem token="21000"/>
</cluster>

Thanks,

Eric 

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster