Hello, I have seen this a few times where one node stops seeing the other node for some unknown reason and fences it. Any idea how I can debug this? Here's from the node doing the fencing: Sep 10 19:01:23 omadvnfs01a corosync[10371]: [TOTEM ] A processor failed, forming new configuration. Sep 10 19:01:25 omadvnfs01a corosync[10371]: [QUORUM] Members[1]: 1 Sep 10 19:01:25 omadvnfs01a corosync[10371]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Sep 10 19:01:25 omadvnfs01a rgmanager[10692]: State change: omadvnfs01b.sec.jel.lc DOWN Sep 10 19:01:25 omadvnfs01a corosync[10371]: [CPG ] chosen downlist: sender r(0) ip(10.198.1.110) ; members(old:2 left:1) Sep 10 19:01:25 omadvnfs01a corosync[10371]: [MAIN ] Completed service synchronization, ready to provide service. Sep 10 19:01:25 omadvnfs01a fenced[10427]: fencing node omadvnfs01b.sec.jel.lc And here is from the fenced node: Sep 10 17:09:27 omadvnfs01b rpc.idmapd[6126]: nfsdcb: read(/proc/net/rpc/nfs4.idtoname/channel) failed: errno 0 (End of File) Sep 10 17:14:47 omadvnfs01b rpc.idmapd[6125]: nfsdcb: read(/proc/net/rpc/nfs4.idtoname/channel) failed: errno 0 (End of File) Sep 10 19:04:44 omadvnfs01b kernel: imklog 5.8.10, log source = /proc/kmsg started. Sep 10 19:04:44 omadvnfs01b rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="2379" x-info="http://www.rsyslog.com"] start I did notice that they were about 40 seconds off in time. I just fixed that but what else can I look for here. Our monitoring started noticing things at 19:02:30 that the fenced node was off the grid which is a little after it was fenced. What test is performed to see if the other node is up? How many times does it try? Thanks! -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster