Re: corosync ring failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Wednesday 23 of July 2014 12:16:48 Digimer wrote:
> Any logs in the switch? Is the multicast group being deleted/recreated?

I believe there would be no multicast for UDPU transport

Can you check to see if any of the devices (servers and switches) is dropping 
UDP packets, be it for congestion or damage?

regards,
Pavel

> 
> On 23/07/14 11:53 AM, C. Handel wrote:
> > hi,
> > 
> > i run a cluster with two corosync rings. One of the rings is marked
> > faulty every fourty seconds, to immediately recover a second later.
> > the other ring is stable
> > 
> > i have no idea how i should debug this.
> > 
> > 
> > we are running sl6.5 with pacemaker 1.1.10, cman 3.0.12, corosync 1.4.1
> > cluster consists of three machines. Ring1 is running on 10gigbit
> > interfaces, Ring0 on 1gigibit interfaces. Both rings don't leave their
> > respective switch.
> > 
> > corosync communication is udpu, rrp_mode is passive
> > 
> > cluster.conf:
> > 
> > <cluster config_version="30" name="aslfile">
> > 
> > <cman transport="udpu">
> > </cman>
> > 
> > <fence_daemon post_join_delay="120" post_fail_delay="30"/>
> > 
> > <fencedevices>
> > 
> >          <fencedevice name="pcmk" agent="fence_pcmk" action="off"/>
> > 
> > </fencedevices>
> > 
> > <quorumd
> > 
> >     cman_label="qdisk"
> >     device="/dev/mapper/mpath-091quorump1"
> >     min_score="1"
> >     votes="2"
> > 
> > </quorumd>
> > 
> > <clusternodes>
> > <clusternode name="asl430m90" nodeid="430">
> > 
> >          <altname name="asl430"/>
> >          <fence>
> >          
> >                  <method name="pcmk-redirect">
> >                  
> >                          <device name="pcmk" port="asl430m90"/>
> >                  
> >                  </method>
> >          
> >          </fence>
> > 
> > </clusternode>
> > <clusternode name="asl431m90" nodeid="431">
> > 
> >          <altname name="asl431"/>
> >          <fence>
> >          
> >                  <method name="pcmk-redirect">
> >                  
> >                          <device name="pcmk" port="asl431m90"/>
> >                  
> >                  </method>
> >          
> >          </fence>
> > 
> > </clusternode>
> > <clusternode name="asl432m90" nodeid="432">
> > 
> >          <altname name="asl432"/>
> >          <fence>
> >          
> >                  <method name="pcmk-redirect">
> >                  
> >                          <device name="pcmk" port="asl432m90"/>
> >                  
> >                  </method>
> >          
> >          </fence>
> > 
> > </clusternode>
> > </clusternodes>
> > </cluster>
> > 
> > 
> > syslog
> > 
> > 
> > Jul 23 17:48:34 asl431 corosync[3254]:   [TOTEM ] Marking ringid 1
> > interface 140.181.134.212 FAULTY
> > Jul 23 17:48:35 asl431 corosync[3254]:   [TOTEM ] Automatically recovered
> > ring 1 Jul 23 17:48:35 asl431 corosync[3254]:   [TOTEM ] Automatically
> > recovered ring 1 Jul 23 17:48:35 asl431 corosync[3254]:   [TOTEM ]
> > Automatically recovered ring 1 Jul 23 17:49:14 asl431 corosync[3254]:  
> > [TOTEM ] Marking ringid 1 interface 140.181.134.212 FAULTY
> > Jul 23 17:49:15 asl431 corosync[3254]:   [TOTEM ] Automatically recovered
> > ring 1 Jul 23 17:49:15 asl431 corosync[3254]:   [TOTEM ] Automatically
> > recovered ring 1 Jul 23 17:49:15 asl431 corosync[3254]:   [TOTEM ]
> > Automatically recovered ring 1
> > 
> > 
> > 
> > Greetings
> > 
> >     Christoph

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux