Am Donnerstag, 24. Juli 2014, 09:30:01 schrieb C. Handel: > >>> i run a cluster with two corosync rings. One of the rings is marked > >>> faulty every fourty seconds, to immediately recover a second later. > >>> the other ring is stable > >>> > >>> i have no idea how i should debug this. > >>> > >>> > >>> we are running sl6.5 with pacemaker 1.1.10, cman 3.0.12, corosync 1.4.1 > >>> cluster consists of three machines. Ring1 is running on 10gigbit > >>> interfaces, Ring0 on 1gigibit interfaces. Both rings don't leave their > >>> respective switch. > >> > >> Any logs in the switch? Is the multicast group being deleted/recreated? > > > > believe there would be no multicast for UDPU transport > > > >Can you check to see if any of the devices (servers and switches) is > >>dropping UDP packets, be it for congestion or damage? > > the switch has no load, interface utilization is below 10%, no crc > errors on the ports and no errors in the log. On the same switch a > second cluster (four machines, similiar config) is running fine. Any Spanning Tree Problems? Dou you have any bridges (i.e. for virtual machines) configured in your setup? did you do some debug on your switch? Greetings, -- Dr. Michael Schwartzkopff Guardinistr. 63 81375 München Tel: (0162) 1650044 Fax: (089) 620 304 13
Attachment:
signature.asc
Description: This is a digitally signed message part.
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster