Re: Problem with machines fencing one another in 2 Node NFS cluster

Digimer <linux@xxxxxxxxxxx> · Mon, 14 Feb 2011 09:03:36 -0500

On 02/14/2011 08:53 AM, Randy Brown wrote:
> Hello,
> 
> I am running a 2 node cluster being used as a NAS head for a Lefthand
> Networks iSCSI SAN to provide NFS mounts out to my network.  Things have
> been OK for a while, but I recently lost one of the nodes as a result of
> a patching problem.  In an effort to recreate the failed node, I imaged
> the working node and installed that image on the failed node.  I set
> it's hostname and IP settings correctly and the machine booted and
> joined the cluster just fine.  Or at least it appeared so.  Things ran
> OK for the last few weeks, but I recently started seeing a behavior
> where the nodes start fencing each other.  I'm wondering if there is
> something as a result of cloning the nodes that could be the problem. 
> Possibly something that should be different but isn't because of the
> cloning?
> 
> I am running CentOS 5.5 with the following package versions:
> 
> Kernel - 2.6.18-194.11.3.el5 #1 SMP
> cman-2.0.115-34.el5_5.4
> lvm2-cluster-2.02.56-7.el5_5.4
> gfs2-utils-0.1.62-20.el5
> kmod-gfs-0.1.34-12.el5.centos
> rgmanager-2.0.52-6.el5.centos.8
> 
> I have a Qlogic qla4062 HBA in the node running: QLogic iSCSI HBA Driver
> (f8b83000) v5.01.03.04
> 
> I will gladly provide more information as needed.
> 
> Thank you,
> Randy

Silly question, but are the NICs mapped to their MAC addresses? If so,
did you update the MAC addresses after cloning the server to reflect the
actual MAC addresses? Assuming so, do you have managed switches? If so,
can you test by swapping out a simple, unmanaged switch?

This sounds like a multicast issue at some level. Fencing happens once
the totem ring is declared failed. Do you see anything interesting in
the log files prior to the fence? Can you run tcpdump to see what is
happening on the interface(s) prior to the fence?

-- 
Digimer
E-Mail: digimer@xxxxxxxxxxx
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster