I have found this on cluster-2.03.11/doc/usage.txt : - To avoid unnecessary fencing when starting the cluster, it's best for all nodes to join the cluster (complete cman_tool join) before any of them do fence_tool join. I think something should be fix to resolve this issue. It is a real problem on a "production" system. When the fencing domain is closed node (after a fence_tool join), node could not enter the cluster. You have to do at the same time on all nodes : #cman_tool join #fence_tool join Strange behaviour... i have this problem on RHEL 5.3. On Fri, Jan 23, 2009 at 02:32:25PM +0000, Mark Watts wrote: > > Hi, > > I've got a 3-node RHEL 5.3 cluster. I'm running the cluster nodes as XEN Dom0 > domains so I can deploy DomU domains as vm services within the cluster. > Hardware is: > > 3 x Dell PowerEdge 1855 blades > 2 x Dell PowerConnect 5316M Ethernet modules (for eth0 and eth1) > > I have a 4th blade acting as an iSCSI target, exporting a 2GB and two 20GB > targets. The 2GB target is used as /etc/xen/ on the cluster nodes, mounted as > a _netdev mount in /etc/fstab on the cluster nodes (mounted on /xen, with > symlinks from /etc/xen to /xen/xen). > All network traffic uses the same switch module, since I'm only using eth0 at > this time. > > To install the nodes, I'm kickstarting from a Satellite, and doing a "yum > update" followed by a reboot to get to RHEL 5.3. > I also deploy the same cluster.conf to each node (appended to this email). > I then bring up cman, rgmanager. clvmd and gfs on all nodes (using the "Send > input to all sessions" feature of Konsole to start the services at the same > time on all nodes). This brings up the cluster, and allows me to mount the > iSCSI target for /xen. > Starting xend allows me to enable the vm service listed in cluster.conf > (clusvcadm -e vm:node1) > Oh, I also log *.* to a syslog server so I can see all the logs in one place. > > Nodes are: > c1.eris.qinetiq.com > c2.eris.qinetiq.com > c3.eris.qinetiq.com > > "So far so good", I think. > > So, I enable cman, rgmanager, clvmd, gfs and xend to start on boot and reboot > the cluster (all three nodes at the same time) > > At which point everything starts to fall apart. > > As the nodes come up and try and create a cluster, nodes c1 and c2 appear to > form a cluster, and then fence node c3 when it joins. > > When node c3 comes back up and tries to join the cluster, node c1 decides the > cluster is no-longer quorate, and fences node c2. > When node c2 comes back up and tries to join the cluster, node c1 decides the > cluster is no-longer quorate, and fences node c3. > > This then continues for as long as I'm entertained watching the logs, and > switch off all three servers. > > > Does anyone have any insight as to what the difference is between starting the > cluster services manually, and starting them at boot is, and why that > difference (because I can't think of any other difference between the two > states) would cause me to never gain a stable cluster? > > I'm at a bit of a loss really - I moved from a 2-node cluster to a 3-node one > to try and avoid exactly these problems. > I've also had the same problem with a CentOS 5.2 cluster on the same > hardware - in that case the nodes were still fencing each other the following > morning, 18 hours later! > > > Regards, > > Mark. > > -- > Mark Watts BSc RHCE MBCS > Senior Systems Engineer > QinetiQ Applied Technologies > GPG Key: http://www.linux-corner.info/mwatts.gpg > <?xml version="1.0"?> > <cluster alias="WebFarmTest" config_version="1" name="WebFarmTest"> > <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> > <clusternodes> > <clusternode name="c1.eris.qinetiq.com" nodeid="1" votes="1"> > <fence> > <method name="1"> > <device name="DRACMC" modulename="Server-1" action="Off"/> > <device name="DRACMC" modulename="Server-1" action="On"/> > </method> > </fence> > </clusternode> > <clusternode name="c2.eris.qinetiq.com" nodeid="2" votes="1"> > <fence> > <method name="1"> > <device name="DRACMC" modulename="Server-2" action="Off"/> > <device name="DRACMC" modulename="Server-2" action="On"/> > </method> > </fence> > </clusternode> > <clusternode name="c3.eris.qinetiq.com" nodeid="3" votes="1"> > <fence> > <method name="1"> > <device name="DRACMC" modulename="Server-3" action="Off"/> > <device name="DRACMC" modulename="Server-3" action="On"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman expected_votes="2"/> > <fencedevices> > <fencedevice agent="fence_drac" ipaddr="XXX" login="XXX" name="DRACMC" passwd="XXX"/> > </fencedevices> > <rm> > <failoverdomains> > <failoverdomain name="webfarm-fd" nofailback="0" ordered="0" restricted="1"> > <failoverdomainnode name="c1.eris.qinetiq.com" priority="1"/> > <failoverdomainnode name="c2.eris.qinetiq.com" priority="1"/> > <failoverdomainnode name="c3.eris.qinetiq.com" priority="1"/> > </failoverdomain> > </failoverdomains> > <resources/> > <vm autostart="1" domain="webfarm-fd" exclusive="1" migrate="live" name="node1" path="/etc/xen/" recovery="relocate"/> > </rm> > </cluster> > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster