>>> i run a cluster with two corosync rings. One of the rings is marked >>> faulty every fourty seconds, to immediately recover a second later. >>> the other ring is stable >>> >>> i have no idea how i should debug this. >>> >>> >>> we are running sl6.5 with pacemaker 1.1.10, cman 3.0.12, corosync 1.4.1 >>> cluster consists of three machines. Ring1 is running on 10gigbit >>> interfaces, Ring0 on 1gigibit interfaces. Both rings don't leave their >>> respective switch. >> Any logs in the switch? Is the multicast group being deleted/recreated? > believe there would be no multicast for UDPU transport >Can you check to see if any of the devices (servers and switches) is >dropping >UDP packets, be it for congestion or damage? the switch has no load, interface utilization is below 10%, no crc errors on the ports and no errors in the log. On the same switch a second cluster (four machines, similiar config) is running fine. Greetings Christoph -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster