Op 22-8-2012 12:50, Dan Frincu schreef:
Hello,
On Wed, Aug 22, 2012 at 1:46 PM, Maurits van de Lande
<M.vandeLande@xxxxxxxxxxxxxxxx> wrote:
Hello,
I hope this is the right mailing list for this question.
Problem:
On a cman cluster with two redundant rings running on Centos6.3, Ring 1 is
marked as faulty every few seconds and then recovers again.
Ring 0:
Two 10Gbe adapters bonded in mode 1 (bond0)
Ring 1:
An 1Gbe adapter connected to a dedicated “ secondary ring network”
Could you post the corosync version and the configuration file?
# corosync -v
Corosync Cluster Engine, version '1.4.1'
Copyright (c) 2006-2009 Red Hat, Inc.
I use cman therefore I do not have a corosync.conf file
cluster.conf
---------------------------------------------------------------------------------------------------------
<?xml version="1.0"?>
<cluster config_version="17" name="VMcluster1">
<clusternodes>
<clusternode name="vmhost1a.vdl-fittings.local" nodeid="1">
<altname name="vmhost1a-cr1.vdl-fittings.local"/>
<fence/>
</clusternode>
<clusternode name="vmhost1c.vdl-fittings.local" nodeid="3">
<altname name="vmhost1c-cr1.vdl-fittings.local"/>
<fence/>
</clusternode>
<clusternode name="vmhost1d.vdl-fittings.local" nodeid="4">
<altname name="vmhost1d-cr1.vdl-fittings.local"/>
<fence/>
</clusternode>
</clusternodes>
<cman/>
<rm>
<resources/>
<failoverdomains>
<failoverdomain name="KVM" nofailback="1" ordered="0"
restricted="1">
<failoverdomainnode name="vmhost1a.vdl-fittings.local"
priority="1"/>
<failoverdomainnode name="vmhost1c.vdl-fittings.local"
priority="1"/>
<failoverdomainnode name="vmhost1d.vdl-fittings.local"
priority="1"/>
</failoverdomain>
</failoverdomains>
<vm autostart="0" domain="KVM" exclusive="0" max_restarts="3"
name="Wsus1" path="/VM/KVM/" recovery="disable" restart_expire_time="360"/>
<vm autostart="0" domain="KVM" exclusive="0" max_restarts="3"
name="Wds1" path="/VM/KVM/" recovery="disable" restart_expire_time="360"/>
<vm autostart="0" domain="KVM" exclusive="0" max_restarts="3"
name="Mailrelay1" path="/VM/KVM/" recovery="disable"
restart_expire_time="360"/>
<vm autostart="0" domain="KVM" exclusive="0" max_restarts="3"
name="Mailrelay2" path="/VM/KVM/" recovery="disable"
restart_expire_time="360"/>
</rm>
<fencedevices>
</fencedevices>
<dlm enable_fencing="0" enable_quorum="1" plock_ownership="1"
plock_rate_limit="0" protocol="sctp"/>
<gfs_controld plock_rate_limit="0"/>
<totem consensus="7000" token="5000"/>
</cluster>
---------------------------------------------------------------------------------------------------------
The problem look similar as problem 3 in this thread
http://www.spinics.net/lists/corosync/msg01637.html
Below I pasted some of the logged messages:
Aug 22 12:41:43 vmhost1d corosync[4947]: [TOTEM ] Marking ringid 1
interface 172.16.100.4 FAULTY
Aug 22 12:41:44 vmhost1d corosync[4947]: [TOTEM ] Automatically recovered
ring 1
Aug 22 12:41:44 vmhost1d corosync[4947]: [TOTEM ] Automatically recovered
ring 1
Aug 22 12:41:44 vmhost1d corosync[4947]: [TOTEM ] Automatically recovered
ring 1
Aug 22 12:41:47 vmhost1d corosync[4947]: [TOTEM ] Marking ringid 1
interface 172.16.100.4 FAULTY
Aug 22 12:41:48 vmhost1d corosync[4947]: [TOTEM ] Automatically recovered
ring 1
Aug 22 12:41:48 vmhost1d corosync[4947]: [TOTEM ] Automatically recovered
ring 1
Aug 22 12:41:48 vmhost1d corosync[4947]: [TOTEM ] Automatically recovered
ring 1
Aug 22 12:41:51 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de76d
Aug 22 12:41:51 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de76f
Aug 22 12:41:51 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de770
Aug 22 12:41:51 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de772
Aug 22 12:41:51 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de772
Aug 22 12:41:52 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de775
Aug 22 12:41:52 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de777
Aug 22 12:41:52 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de778
Aug 22 12:41:52 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de779
Aug 22 12:41:52 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de779
Aug 22 12:41:53 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de77c
2de77e
Aug 22 12:41:53 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de77e
2de781
Aug 22 12:41:53 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de781
2de783
Aug 22 12:41:53 vmhost1d corosync[4947]: [TOTEM ] Retransmit List: 2de781
2de785
Aug 22 12:41:53 vmhost1d corosync[4947]: [TOTEM ] Marking ringid 1
interface 172.16.100.4 FAULTY
What can be done to solve this problem?
Best regards,
Maurits van de Lande
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss