Any logs in the switch? Is the multicast group being deleted/recreated?
On 23/07/14 11:53 AM, C. Handel wrote:
hi,
i run a cluster with two corosync rings. One of the rings is marked
faulty every fourty seconds, to immediately recover a second later.
the other ring is stable
i have no idea how i should debug this.
we are running sl6.5 with pacemaker 1.1.10, cman 3.0.12, corosync 1.4.1
cluster consists of three machines. Ring1 is running on 10gigbit
interfaces, Ring0 on 1gigibit interfaces. Both rings don't leave their
respective switch.
corosync communication is udpu, rrp_mode is passive
cluster.conf:
<cluster config_version="30" name="aslfile">
<cman transport="udpu">
</cman>
<fence_daemon post_join_delay="120" post_fail_delay="30"/>
<fencedevices>
<fencedevice name="pcmk" agent="fence_pcmk" action="off"/>
</fencedevices>
<quorumd
cman_label="qdisk"
device="/dev/mapper/mpath-091quorump1"
min_score="1"
votes="2"
>
</quorumd>
<clusternodes>
<clusternode name="asl430m90" nodeid="430">
<altname name="asl430"/>
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="asl430m90"/>
</method>
</fence>
</clusternode>
<clusternode name="asl431m90" nodeid="431">
<altname name="asl431"/>
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="asl431m90"/>
</method>
</fence>
</clusternode>
<clusternode name="asl432m90" nodeid="432">
<altname name="asl432"/>
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="asl432m90"/>
</method>
</fence>
</clusternode>
</clusternodes>
</cluster>
syslog
Jul 23 17:48:34 asl431 corosync[3254]: [TOTEM ] Marking ringid 1
interface 140.181.134.212 FAULTY
Jul 23 17:48:35 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
Jul 23 17:48:35 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
Jul 23 17:48:35 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
Jul 23 17:49:14 asl431 corosync[3254]: [TOTEM ] Marking ringid 1
interface 140.181.134.212 FAULTY
Jul 23 17:49:15 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
Jul 23 17:49:15 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
Jul 23 17:49:15 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
Greetings
Christoph
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster