rrp_mode active doesn't detect faulty ring

Oren Nechushtan <theoren28@xxxxxxxxxxx> · Mon, 23 Jan 2012 13:43:41 +0000



Hi,
I noticed an issue with rrp mode active with one ring working (0, cross cable) and the other not (1, link exists, connected to different vlans, for testing)
The transport is udpu (cman+pacemaker+corosync)
There are many links related to this issue but non seems to be fixed in corosync 1.4.1 nor corosync-1.4.2, 
e.g., http://www.gossamer-threads.com/lists/linuxha/pacemaker/77388
 
Running corosync-cfgtool -s swaps between no faults and [1 of 3]

# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
        id      = 1.0.0.12
        status  = ring 0 active with no faults
RING ID 1
        id      = 1.0.0.2
        status  = ring 1 active with no faults
# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
        id      = 1.0.0.12
        status  = ring 0 active with no faults
RING ID 1
        id      = 1.0.0.2
        status  = Incrementing problem counter for seqid 958 iface 1.0.0.2 to [1 of 3]

Jan 23 08:22:32 corosync [TOTEM ] Incrementing problem counter for seqid 898 iface 1.0.0.2 to [1 of 3]
Jan 23 08:22:34 corosync [TOTEM ] ring 1 active with no faults
Jan 23 08:22:36 corosync [TOTEM ] Incrementing problem counter for seqid 900 iface 1.0.0.2 to [1 of 3]
Jan 23 08:22:38 corosync [TOTEM ] ring 1 active with no faults
Jan 23 08:22:41 corosync [TOTEM ] Incrementing problem counter for seqid 902 iface 1.0.0.2 to [1 of 3]
Jan 23 08:22:43 corosync [TOTEM ] ring 1 active with no faults
Jan 23 08:22:45 corosync [TOTEM ] Incrementing problem counter for seqid 904 iface 1.0.0.2 to [1 of 3]
Jan 23 08:22:47 corosync [TOTEM ] ring 1 active with no faults
Jan 23 08:22:49 corosync [TOTEM ] Incrementing problem counter for seqid 906 iface 1.0.0.2 to [1 of 3]
Jan 23 08:22:51 corosync [TOTEM ] ring 1 active with no faults
Jan 23 08:22:54 corosync [TOTEM ] Incrementing problem counter for seqid 908 iface 1.0.0.2 to [1 of 3]
Jan 23 08:22:56 corosync [TOTEM ] ring 1 active with no faults

cluster.conf
<?xml version="1.0"?>
<cluster config_version="3" name="node">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3" skip_undefined="1"/>
        <totem rrp_mode="active"/>
        <clusternodes>
                <clusternode name="node-1" nodeid="1" votes="1">
                        <altname name="local1sync2"/>
                        <fence/>
                </clusternode>
                <clusternode name="node-2" nodeid="2" votes="1">
                        <altname name="local2sync2"/>
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" keyfile="/etc/cluster/corosync.authkey" transport="udpu" two_node="1"/>
        <fencedevices/>
        <rm/>
</cluster>
 
The traffic on both sides seems as expected
 
corosync-1.4.2-1fs.el6.i686
corosynclib-1.4.2-1fs.el6.i686
pacemaker-1.1.6-3.el6.i686
pacemaker-libs-1.1.6-3.el6.i686
pacemaker-cluster-libs-1.1.6-3.el6.i686
pacemaker-cli-1.1.6-3.el6.i686
cman-3.0.12.1-23.el6.i686
 
Centos 6.2
Linux 2.6.32-220.2.1.el6.i686 #1 SMP Thu Dec 22 18:50:52 GMT 2011 i686 i686 i386 GNU/Linux
 
Again, the issue is the ring is not detected as fauty
 
NOTE:  Changing rrp mode to passive seems to workaround the issue 
 status  = Marking ringid 1 interface 1.0.0.2 FAULTY
 
Thanks, 
Oren
 		 	   		  
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss