On 04/06/14 10:59 AM, Schaefer, Micah wrote:
I have a 4 node cluster, running a single service group. I have been
seeing node1 fence node3 while node3 is actively running the service group
at random intervals.
Rgmanager logs show no failures in service checks, and no other logs
provide any useful information. How can I go about finding out why node1
is fencing node3?
I currently set up the failover domain to be restricted and not include
node3.
cluster.conf : http://pastebin.com/xYy6xp6N
Random fencing is almost always caused by network failures. Can you look
are the system logs, starting a little before the fence and continuing
until after the fence completes, and paste them here? I suspect you will
see corosync complaining.
If this is true, do your switches support persistent multicast? Do you
use active/passive bonding? Have you tried different switch/cable/NIC?
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster