On 02/14/2011 08:53 AM, Randy Brown wrote: > Hello, > > I am running a 2 node cluster being used as a NAS head for a Lefthand > Networks iSCSI SAN to provide NFS mounts out to my network. Things have > been OK for a while, but I recently lost one of the nodes as a result of > a patching problem. In an effort to recreate the failed node, I imaged > the working node and installed that image on the failed node. I set > it's hostname and IP settings correctly and the machine booted and > joined the cluster just fine. Or at least it appeared so. Things ran > OK for the last few weeks, but I recently started seeing a behavior > where the nodes start fencing each other. I'm wondering if there is > something as a result of cloning the nodes that could be the problem. > Possibly something that should be different but isn't because of the > cloning? > > I am running CentOS 5.5 with the following package versions: > > Kernel - 2.6.18-194.11.3.el5 #1 SMP > cman-2.0.115-34.el5_5.4 > lvm2-cluster-2.02.56-7.el5_5.4 > gfs2-utils-0.1.62-20.el5 > kmod-gfs-0.1.34-12.el5.centos > rgmanager-2.0.52-6.el5.centos.8 > > I have a Qlogic qla4062 HBA in the node running: QLogic iSCSI HBA Driver > (f8b83000) v5.01.03.04 > > I will gladly provide more information as needed. > > Thank you, > Randy Silly question, but are the NICs mapped to their MAC addresses? If so, did you update the MAC addresses after cloning the server to reflect the actual MAC addresses? Assuming so, do you have managed switches? If so, can you test by swapping out a simple, unmanaged switch? This sounds like a multicast issue at some level. Fencing happens once the totem ring is declared failed. Do you see anything interesting in the log files prior to the fence? Can you run tcpdump to see what is happening on the interface(s) prior to the fence? -- Digimer E-Mail: digimer@xxxxxxxxxxx AN!Whitepapers: http://alteeve.com Node Assassin: http://nodeassassin.org -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster