Re: corosync ring failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Any logs in the switch? Is the multicast group being deleted/recreated?

On 23/07/14 11:53 AM, C. Handel wrote:
hi,

i run a cluster with two corosync rings. One of the rings is marked
faulty every fourty seconds, to immediately recover a second later.
the other ring is stable

i have no idea how i should debug this.


we are running sl6.5 with pacemaker 1.1.10, cman 3.0.12, corosync 1.4.1
cluster consists of three machines. Ring1 is running on 10gigbit
interfaces, Ring0 on 1gigibit interfaces. Both rings don't leave their
respective switch.

corosync communication is udpu, rrp_mode is passive

cluster.conf:

<cluster config_version="30" name="aslfile">

<cman transport="udpu">
</cman>

<fence_daemon post_join_delay="120" post_fail_delay="30"/>

<fencedevices>
         <fencedevice name="pcmk" agent="fence_pcmk" action="off"/>
</fencedevices>

<quorumd
    cman_label="qdisk"
    device="/dev/mapper/mpath-091quorump1"
    min_score="1"
    votes="2"
    >
</quorumd>

<clusternodes>
<clusternode name="asl430m90" nodeid="430">
         <altname name="asl430"/>
         <fence>
                 <method name="pcmk-redirect">
                         <device name="pcmk" port="asl430m90"/>
                 </method>
         </fence>
</clusternode>
<clusternode name="asl431m90" nodeid="431">
         <altname name="asl431"/>
         <fence>
                 <method name="pcmk-redirect">
                         <device name="pcmk" port="asl431m90"/>
                 </method>
         </fence>
</clusternode>
<clusternode name="asl432m90" nodeid="432">
         <altname name="asl432"/>
         <fence>
                 <method name="pcmk-redirect">
                         <device name="pcmk" port="asl432m90"/>
                 </method>
         </fence>
</clusternode>
</clusternodes>
</cluster>


syslog


Jul 23 17:48:34 asl431 corosync[3254]:   [TOTEM ] Marking ringid 1
interface 140.181.134.212 FAULTY
Jul 23 17:48:35 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
Jul 23 17:48:35 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
Jul 23 17:48:35 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
Jul 23 17:49:14 asl431 corosync[3254]:   [TOTEM ] Marking ringid 1
interface 140.181.134.212 FAULTY
Jul 23 17:49:15 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
Jul 23 17:49:15 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
Jul 23 17:49:15 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1



Greetings
    Christoph



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux