Crossover cable?????? With all the $$ spent, try putting a switch between the nodes. Paul ----- Original Message ----- From: "Doug Tucker" <tuckerd@xxxxxxxxxxxx> To: linux-cluster@xxxxxxxxxx Sent: Monday, February 22, 2010 10:15:49 AM (GMT-0600) America/Chicago Subject: Repeated fencing We have a 2 4.x cluster that has developed an issue we are unable to resolve. Starting back in December, the nodes began fencing each other randomly, and as frequently as once a day. There is nothing at the console prior to it happening, and nothing in the logs. We have not been able to develop any pattern to this point, the 2 nodes appear to be functioning fine, and suddenly in the logs a message will appear about "node x missed too many heartbeats" and the next thing you see is it fencing the node. Thinking we possibly had a hardware issue, we replaced both nodes from scratch with new machines, the problem persists. The cluster communication is done via a crossover cable on eth1 on both devices with private ip's. We have a 2nd cluster that is not having this issue, and both nodes have been up for over 160 days. The configuration is basically identical to the problematic cluster. The only difference between the 2 now is the newer hardware on the problematic node (prior, that was identical), and the kernel. The non-problematic cluster is still running kernel 89.0.9 and the problematic cluster is on 89.0.11. We are afraid at this point to allow our non problematic cluster upgrade to the latest packages. Any insight or advice would be greatly appreciated, we have exhausted our ideas here. Sincerely, Doug -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster