Re: Secondary ring problem with corosync 1.4.1

Maurits van de Lande <m.vandelande@xxxxxxxxxxxxxxxx> · Wed, 22 Aug 2012 14:11:37 +0200

Op 22-8-2012 12:50, Dan Frincu schreef:
Hello,

On Wed, Aug 22, 2012 at 1:46 PM, Maurits van de Lande
<M.vandeLande@xxxxxxxxxxxxxxxx> wrote:
Hello,

I hope this is the right mailing list for this question.

Problem:

On a cman cluster with two redundant rings running on Centos6.3, Ring 1 is
marked as faulty every few seconds and then recovers again.

Ring 0:

Two 10Gbe adapters bonded in mode 1 (bond0)

Ring 1:

An 1Gbe adapter connected to a dedicated “ secondary ring network”

Could you post the corosync version and the configuration file?
# corosync -v
Corosync Cluster Engine, version '1.4.1'
Copyright (c) 2006-2009 Red Hat, Inc.

I use cman therefore I do not have a corosync.conf file

cluster.conf
---------------------------------------------------------------------------------------------------------
<?xml version="1.0"?>
<cluster config_version="17" name="VMcluster1">
    <clusternodes>
        <clusternode name="vmhost1a.vdl-fittings.local" nodeid="1">
            <altname name="vmhost1a-cr1.vdl-fittings.local"/>
            <fence/>
        </clusternode>
        <clusternode name="vmhost1c.vdl-fittings.local" nodeid="3">
            <altname name="vmhost1c-cr1.vdl-fittings.local"/>
            <fence/>
        </clusternode>
        <clusternode name="vmhost1d.vdl-fittings.local" nodeid="4">
            <altname name="vmhost1d-cr1.vdl-fittings.local"/>
            <fence/>
        </clusternode>
    </clusternodes>
    <cman/>
    <rm>
        <resources/>
        <failoverdomains>
            <failoverdomain name="KVM" nofailback="1" ordered="0" 
restricted="1">
                <failoverdomainnode name="vmhost1a.vdl-fittings.local" 
priority="1"/>
                <failoverdomainnode name="vmhost1c.vdl-fittings.local" 
priority="1"/>
                <failoverdomainnode name="vmhost1d.vdl-fittings.local" 
priority="1"/>
            </failoverdomain>
        </failoverdomains>
        <vm autostart="0" domain="KVM" exclusive="0" max_restarts="3" 
name="Wsus1" path="/VM/KVM/" recovery="disable" restart_expire_time="360"/>
        <vm autostart="0" domain="KVM" exclusive="0" max_restarts="3" 
name="Wds1" path="/VM/KVM/" recovery="disable" restart_expire_time="360"/>
        <vm autostart="0" domain="KVM" exclusive="0" max_restarts="3" 
name="Mailrelay1" path="/VM/KVM/" recovery="disable" 
restart_expire_time="360"/>
        <vm autostart="0" domain="KVM" exclusive="0" max_restarts="3" 
name="Mailrelay2" path="/VM/KVM/" recovery="disable" 
restart_expire_time="360"/>
    </rm>
    <fencedevices>
    </fencedevices>
    <dlm enable_fencing="0" enable_quorum="1" plock_ownership="1" 
plock_rate_limit="0" protocol="sctp"/>
    <gfs_controld plock_rate_limit="0"/>
    <totem consensus="7000" token="5000"/>
</cluster>
---------------------------------------------------------------------------------------------------------

The problem look similar as problem 3 in this thread

http://www.spinics.net/lists/corosync/msg01637.html

Below I pasted some of the logged messages:

Aug 22 12:41:43 vmhost1d corosync[4947]:   [TOTEM ] Marking ringid 1
interface 172.16.100.4 FAULTY

Aug 22 12:41:44 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:44 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:44 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:47 vmhost1d corosync[4947]:   [TOTEM ] Marking ringid 1
interface 172.16.100.4 FAULTY

Aug 22 12:41:48 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:48 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:48 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:51 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de76d

Aug 22 12:41:51 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de76f

Aug 22 12:41:51 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de770

Aug 22 12:41:51 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de772

Aug 22 12:41:51 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de772

Aug 22 12:41:52 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de775

Aug 22 12:41:52 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de777

Aug 22 12:41:52 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de778

Aug 22 12:41:52 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de779

Aug 22 12:41:52 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de779

Aug 22 12:41:53 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de77c
2de77e

Aug 22 12:41:53 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de77e
2de781

Aug 22 12:41:53 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de781
2de783

Aug 22 12:41:53 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de781
2de785

Aug 22 12:41:53 vmhost1d corosync[4947]:   [TOTEM ] Marking ringid 1
interface 172.16.100.4 FAULTY

What can be done to solve this problem?

Best regards,

Maurits van de Lande

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss