>> the switch has no load, interface utilization is below 10%, no crc >> errors on the ports and no errors in the log. On the same switch a >> second cluster (four machines, similiar config) is running fine. > > did you vlan the switches so the two clusters are "logically separate"? if > they're on the same VLAN they might interfere with each other... > > also i second Michael Schwartzkopff's suggestion of looking into Spanning > Tree Protocol (STP). if your switches (i'm assUming you're using two) are > not stacked(1), you may be running into that, as well. There are vlans and spanning trees and stacking. Both (actualy three) cluster are in the same vlans. One vlan for cluster internals and one for external. One internal ring, one external ring. On the internal ring each Cluster uses it's own IP subnet. Communication is udpu and should not interfere. Spanning tree has no events. All ports are always in forwarding mode. As ring failure happens every 40 seconds i think it is unlikely for spanning tree to be the reason. i pulled a wiredump on one of the nodes (432). But i can't really make sense of it. orf packets from node 431 arive and i send out orf packets to the node 430. So the ring looks fine. I modified the cluster config to include <dlm protocol="sctp"/> <!-- missed the note in "man cman" about this --> <totem rrp_mode="active" /> <!-- missed this note also --> rebooted all nodes (for another reason) and everything looks fine now. No idea if it is the config change or the reboot. I will pull another wiredump and compare. Greetings Christoph -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster