Logs: http://pastebin.com/QCh5FzZu I have one 10gb nic connected Here is the corosync log from node1, I see that is says ³ A processor failed, forming new configuration.², I need to dig deeper though. May 27 10:03:49 corosync [QUORUM] Members[4]: 1 2 3 4 May 27 10:05:04 corosync [QUORUM] Members[4]: 1 2 3 4 Jun 03 13:52:34 corosync [TOTEM ] A processor failed, forming new configuration. Jun 03 13:52:46 corosync [QUORUM] Members[3]: 1 2 4 Jun 03 13:52:46 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 03 13:52:46 corosync [CPG ] chosen downlist: sender r(0) ip(10.70.100.101) ; members(old:4 left:1) Jun 03 13:52:46 corosync [MAIN ] Completed service synchronization, ready to provide service. Jun 03 13:56:14 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 03 13:56:14 corosync [CPG ] chosen downlist: sender r(0) ip(10.70.100.101) ; members(old:3 left:0) Jun 03 13:56:14 corosync [MAIN ] Completed service synchronization, ready to provide service. Jun 03 13:56:28 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 03 13:56:28 corosync [CPG ] chosen downlist: sender r(0) ip(10.70.100.101) ; members(old:3 left:0) Jun 03 13:56:28 corosync [MAIN ] Completed service synchronization, ready to provide service. Jun 03 13:56:41 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 03 13:56:41 corosync [CPG ] chosen downlist: sender r(0) ip(10.70.100.101) ; members(old:3 left:0) Jun 03 13:56:41 corosync [MAIN ] Completed service synchronization, ready to provide service. Jun 03 13:57:04 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 03 13:57:04 corosync [CPG ] chosen downlist: sender r(0) ip(10.70.100.101) ; members(old:3 left:0) Jun 03 13:57:04 corosync [MAIN ] Completed service synchronization, ready to provide service. Jun 03 15:12:09 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 03 15:12:09 corosync [QUORUM] Members[4]: 1 2 3 4 Jun 03 15:12:09 corosync [QUORUM] Members[4]: 1 2 3 4 Jun 03 15:12:09 corosync [CPG ] chosen downlist: sender r(0) ip(10.70.100.101) ; members(old:3 left:0) Jun 03 15:12:09 corosync [MAIN ] Completed service synchronization, ready to provide service. Regards, ------- Micah Schaefer JHU/ APL ITSD/ ITC 240-228-1148 (x81148) On 6/4/14, 11:13 AM, "Digimer" <lists@xxxxxxxxxx> wrote: >On 04/06/14 10:59 AM, Schaefer, Micah wrote: >> I have a 4 node cluster, running a single service group. I have been >> seeing node1 fence node3 while node3 is actively running the service >>group >> at random intervals. >> >> Rgmanager logs show no failures in service checks, and no other logs >> provide any useful information. How can I go about finding out why node1 >> is fencing node3? >> >> I currently set up the failover domain to be restricted and not include >> node3. >> >> cluster.conf : http://pastebin.com/xYy6xp6N > >Random fencing is almost always caused by network failures. Can you look >are the system logs, starting a little before the fence and continuing >until after the fence completes, and paste them here? I suspect you will >see corosync complaining. > >If this is true, do your switches support persistent multicast? Do you >use active/passive bonding? Have you tried different switch/cable/NIC? > >-- >Digimer >Papers and Projects: https://alteeve.ca/w/ >What if the cure for cancer is trapped in the mind of a person without >access to education? > >-- >Linux-cluster mailing list >Linux-cluster@xxxxxxxxxx >https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster