Re: corosync ring failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>> i run a cluster with two corosync rings. One of the rings is marked
>>> faulty every fourty seconds, to immediately recover a second later.
>>> the other ring is stable
>>>
>>> i have no idea how i should debug this.
>>>
>>>
>>> we are running sl6.5 with pacemaker 1.1.10, cman 3.0.12, corosync 1.4.1
>>> cluster consists of three machines. Ring1 is running on 10gigbit
>>> interfaces, Ring0 on 1gigibit interfaces. Both rings don't leave their
>>> respective switch.

>> Any logs in the switch? Is the multicast group being deleted/recreated?

> believe there would be no multicast for UDPU transport

>Can you check to see if any of the devices (servers and switches) is >dropping
>UDP packets, be it for congestion or damage?

the switch has no load, interface utilization is below 10%, no crc
errors on the ports and no errors in the log. On the same switch a
second cluster (four machines, similiar config) is running fine.

Greetings
   Christoph

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux