Re: Secondary ring problem with corosync 1.4.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Op 22-8-2012 12:50, Dan Frincu schreef:
Hello,

On Wed, Aug 22, 2012 at 1:46 PM, Maurits van de Lande
<M.vandeLande@xxxxxxxxxxxxxxxx> wrote:
Hello,



I hope this is the right mailing list for this question.



Problem:

On a cman cluster with two redundant rings running on Centos6.3, Ring 1 is
marked as faulty every few seconds and then recovers again.



Ring 0:

Two 10Gbe adapters bonded in mode 1 (bond0)



Ring 1:

An 1Gbe adapter connected to a dedicated “ secondary ring network”


Could you post the corosync version and the configuration file?
# corosync -v
Corosync Cluster Engine, version '1.4.1'
Copyright (c) 2006-2009 Red Hat, Inc.

I use cman therefore I do not have a corosync.conf file

cluster.conf
---------------------------------------------------------------------------------------------------------
<?xml version="1.0"?>
<cluster config_version="17" name="VMcluster1">
    <clusternodes>
        <clusternode name="vmhost1a.vdl-fittings.local" nodeid="1">
            <altname name="vmhost1a-cr1.vdl-fittings.local"/>
            <fence/>
        </clusternode>
        <clusternode name="vmhost1c.vdl-fittings.local" nodeid="3">
            <altname name="vmhost1c-cr1.vdl-fittings.local"/>
            <fence/>
        </clusternode>
        <clusternode name="vmhost1d.vdl-fittings.local" nodeid="4">
            <altname name="vmhost1d-cr1.vdl-fittings.local"/>
            <fence/>
        </clusternode>
    </clusternodes>
    <cman/>
    <rm>
        <resources/>
        <failoverdomains>
<failoverdomain name="KVM" nofailback="1" ordered="0" restricted="1"> <failoverdomainnode name="vmhost1a.vdl-fittings.local" priority="1"/> <failoverdomainnode name="vmhost1c.vdl-fittings.local" priority="1"/> <failoverdomainnode name="vmhost1d.vdl-fittings.local" priority="1"/>
            </failoverdomain>
        </failoverdomains>
<vm autostart="0" domain="KVM" exclusive="0" max_restarts="3" name="Wsus1" path="/VM/KVM/" recovery="disable" restart_expire_time="360"/> <vm autostart="0" domain="KVM" exclusive="0" max_restarts="3" name="Wds1" path="/VM/KVM/" recovery="disable" restart_expire_time="360"/> <vm autostart="0" domain="KVM" exclusive="0" max_restarts="3" name="Mailrelay1" path="/VM/KVM/" recovery="disable" restart_expire_time="360"/> <vm autostart="0" domain="KVM" exclusive="0" max_restarts="3" name="Mailrelay2" path="/VM/KVM/" recovery="disable" restart_expire_time="360"/>
    </rm>
    <fencedevices>
    </fencedevices>
<dlm enable_fencing="0" enable_quorum="1" plock_ownership="1" plock_rate_limit="0" protocol="sctp"/>
    <gfs_controld plock_rate_limit="0"/>
    <totem consensus="7000" token="5000"/>
</cluster>
---------------------------------------------------------------------------------------------------------


The problem look similar as problem 3 in this thread

http://www.spinics.net/lists/corosync/msg01637.html





Below I pasted some of the logged messages:



Aug 22 12:41:43 vmhost1d corosync[4947]:   [TOTEM ] Marking ringid 1
interface 172.16.100.4 FAULTY

Aug 22 12:41:44 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:44 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:44 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:47 vmhost1d corosync[4947]:   [TOTEM ] Marking ringid 1
interface 172.16.100.4 FAULTY

Aug 22 12:41:48 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:48 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:48 vmhost1d corosync[4947]:   [TOTEM ] Automatically recovered
ring 1

Aug 22 12:41:51 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de76d

Aug 22 12:41:51 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de76f

Aug 22 12:41:51 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de770

Aug 22 12:41:51 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de772

Aug 22 12:41:51 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de772

Aug 22 12:41:52 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de775

Aug 22 12:41:52 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de777

Aug 22 12:41:52 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de778

Aug 22 12:41:52 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de779

Aug 22 12:41:52 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de779

Aug 22 12:41:53 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de77c
2de77e

Aug 22 12:41:53 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de77e
2de781

Aug 22 12:41:53 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de781
2de783

Aug 22 12:41:53 vmhost1d corosync[4947]:   [TOTEM ] Retransmit List: 2de781
2de785

Aug 22 12:41:53 vmhost1d corosync[4947]:   [TOTEM ] Marking ringid 1
interface 172.16.100.4 FAULTY



What can be done to solve this problem?



Best regards,



Maurits van de Lande








_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss





_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux